Python | Named Entity Recognition (NER) using spaCy
Last Updated :
03 Apr, 2025
Named Entity Recognition (NER) is used in Natural Language Processing (NLP) to identify and classify important information within unstructured text. These “named entities” include proper nouns like people, organizations, locations and other meaningful categories such as dates, monetary values and products. By tagging these entities, we can transform raw text into structured data that can be analyzed, indexed or used in applications.

Representation of Named Entity Recognition
Use of spaCy in NER
spaCy is efficient in NLP tasks and is available in Python. It offers:
- Optimized performance: spaCy is built for high-speed text processing making it ideal for large-scale NLP tasks.
- Pre-trained models: It includes various pre-trained NER models that recognize multiple entity types out of the box.
- Ease of use: With a user-friendly API allowing developers to implement NER with minimal effort.
- Deep learning integration: The library works seamlessly with deep learning frameworks like TensorFlow and PyTorch.
- Efficient pipeline processing: It can efficiently handle text processing tasks, including tokenization, part-of-speech tagging, dependency parsing and named entity recognition.
- Customizability: We can train custom models or manually defining new entities.
Implementation of NER using spaCy
Here is the step by step procedure to do NER using spaCy:
1. Install spaCy
We will download spaCy. We will use en_core_web_sm
model which is used for english and is a lightweight model that includes pre-trained word vectors and an NER component. spaCy supports various entity types including:
- PERSON – Names of people
- ORG – Organizations
- GPE – Countries, cities, states
- DATE – Dates and time expressions
- MONEY – Monetary values
- PRODUCT – Products and brand names
- EVENT – Events (e.g., “Olympics”)
- LAW – Legal documents
A full list of entity types can be found in the spaCy documentation.
!pip install spacy
!python - m spacy download en_core_web_sm
The following code demonstrates how to perform NER using spaCy:
spacy.load("en_core_web_sm")
loads the pre-trained English model.nlp(text)
processes the input text and tokenizes it.doc.ents
contains all recognized named entities.
Python
import spacy
nlp = spacy.load('en_core_web_sm')
sentence = "Why Apple is looking at buying U.K. startup for $1 billion ?"
doc = nlp(sentence)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Output:
Apple 4 9 ORG
U.K. 31 35 GPE
$1 billion 48 58 MONEY
Here Apple is classified as an Organization (ORG), U.K. as a Geopolitical Entity (GPE) and $1 billion as Money (MONEY).
3. Effect of Case Sensitivity
Here we examine how capitalization affects entity recognition. Lowercasing an entity name may prevent it from being recognized correctly.
Python
sentence = "Why apple is now looking at buying U.K. startup for $1 billion ?"
doc = nlp(sentence)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Output:
U.K. 35 39 GPE
$1 billion 52 62 MONEY
Since “apple” is in lowercase it is no longer recognised as an organization.
4. Customizing Named Entity Recognition
Here we manually add a new named entity to spaCy’s output. This technique is useful when you want to recognize specific terms that are not in the pre-trained model.
- We use
Span
to define the new entity. - The entity is added to
doc.ents
to update the output.
Python
from spacy.tokens import Span
doc = nlp("Tesla is planning to launch a new product.")
custom_label = "ORG"
doc.ents = (Span(doc, 0, 1, label=custom_label),)
for ent in doc.ents:
print(ent.text, ent.label_)
Output:
Tesla ORG
Here “Tesla” was manually added as an organization. In a full NER training setup you can retrain the model using annotated datasets.
Named Entity Recognition (NER) is an essential tool for extracting valuable insights from unstructured text for better automation and analysis across industries. spaCy’s flexible capabilities allow developers to quickly implement and customize entity recognition for specific applications. It also offers an efficient and scalable solution for handling named entity recognition in real-world text processing.
You can download source code from here.
Similar Reads
Python - Phrase removal in String
Sometimes, while working with Python strings, we can have a problem in which we need to extract certain words in a string excluding the initial and rear K words. This can have application in many domains including all those include data. Lets discuss certain ways in which this task can be performed.
2 min read
Python | Gender Identification by name using NLTK
Natural Language Toolkit (NLTK) is a platform used for building programs for text analysis. We can observe that male and female names have some distinctive characteristics. Names ending in a, e and i are likely to be female, while names ending in k, o, r, s and t are likely to be male. Let's build a
4 min read
Install Gensim using Python PIP
Gensim is an open-source Python library designed for topic modeling and document similarity analysis. It is widely used for natural language processing (NLP) tasks, such as text summarization, semantic analysis, and document clustering. Installing Gensim is a straightforward process, and this step-b
3 min read
Python - All occurrences of substring in string
A substring is a contiguous occurrence of characters within a string. Identifying all instances of a substring is important for verifying various tasks. In this article, we will check all occurrences of a substring in String. Using re.finditer()re.finditer() returns an iterator yielding match object
3 min read
Python | Split a sentence into list of words
When working with text in Python, we often need to break down a sentence into individual words. This task is easy to accomplish in Python using different methods. The simplest way is by using split(), but more complex tasks can be handled with regular expressions or list comprehension. Depending on
2 min read
Extract List of Substrings in List of Strings in Python
Working with strings is a fundamental aspect of programming, and Python provides a plethora of methods to manipulate and extract substrings efficiently. When dealing with a list of strings, extracting specific substrings can be a common requirement. In this article, we will explore five simple and c
3 min read
Iterate over words of a String in Python
In this article, weâll explore different ways to iterate over the words in a string using Python. Let's start with an example to iterate over words in a Python string: [GFGTABS] Python s = "Learning Python is fun" for word in s.split(): print(word) [/GFGTABS]OutputLearning Python is fun Ex
2 min read
Install Specific Version of Spacy using Python PIP
SpaCy is basically an open-source Natural Language Processing (NLP) library used for advanced tasks in the NLP field, written in programming languages like Python and Cython. Sometimes, in your project, you don't want to use the updated version of SpaCy. In this case, you want to install the specifi
3 min read
Python | Extract words from given string
In Python, we sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthand to perform this task is always useful. Additionally, this article also includes the cases in which punctuation
5 min read
Python | Positional Index
This article talks about building an inverted index for an information retrieval (IR) system. However, in a real-life IR system, we not only encounter single-word queries (such as "dog", "computer", or "alex") but also phrasal queries (such as "winter is coming", "new york", or "where is kevin"). To
5 min read