Python | Part of Speech Tagging using TextBlob
Last Updated :
11 Apr, 2022
TextBlob module is used for building programs for text analysis. One of the more powerful aspects of the TextBlob module is the Part of Speech tagging. Install TextBlob run the following commands:
$ pip install -U textblob
$ python -m textblob.download_corpora
This will install TextBlob and download the necessary NLTK corpora. The above installation will take quite some time due to the massive amount of tokenizers, chunkers, other algorithms, and all of the corpora to be downloaded.
Let’s knock out some quick vocabulary: Corpus : Body of text, singular. Corpora is the plural of this. Lexicon : Words and their meanings. Token : Each “entity” that is a part of whatever was split up based on rules.
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called Grammatical tagging or Word-category disambiguation.
Input: Everything is all about money.
Output: [('Everything', 'NN'), ('is', 'VBZ'),
('all', 'DT'), ('about', 'IN'),
('money', 'NN')]
Here’s a list of the tags, what they mean, and some examples:
CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: “there is” … think of it like “there exists”)
FW foreign word
IN preposition/subordinating conjunction
JJ adjective ‘big’
JJR adjective, comparative ‘bigger’
JJS adjective, superlative ‘biggest’
LS list marker 1)
MD modal could, will
NN noun, singular ‘desk’
NNS noun plural ‘desks’
NNP proper noun, singular ‘Harrison’
NNPS proper noun, plural ‘Americans’
PDT predeterminer ‘all the kids’
POS possessive ending parent‘s
PRP personal pronoun I, he, she
PRP$ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to go ‘to‘ the store.
UH interjection errrrrrrrm
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-adverb where, when
Python3
# from textblob lib import TextBlob method
from textblob import TextBlob
text = ("Sukanya, Rajib and Naba are my good friends. " +
"Sukanya is getting married next year. " +
"Marriage is a big step in one’s life." +
"It is both exciting and frightening. " +
"But friendship is a sacred bond between people." +
"It is a special kind of love between us. " +
"Many of you must have tried searching for a friend "+
"but never found the right one.")
# create a textblob object
blob_object = TextBlob(text)
# Part-of-speech tags can be accessed
# through the tags property of blob object.'
# print word with pos tag.
print(blob_object.tags)
Output :
[('Sukanya', 'NNP'),
('Rajib', 'NNP'),
('and', 'CC'),
('Naba', 'NNP'),
('are', 'VBP'),
('my', 'PRP$'),
('good', 'JJ'),
('friends', 'NNS'),
('Sukanya', 'NNP'),
('is', 'VBZ'),
('getting', 'VBG'),
('married', 'VBN'),
('next', 'JJ'),
('year', 'NN'),
('Marriage', 'NN'),
('is', 'VBZ'),
('a', 'DT'),
('big', 'JJ'),
('step', 'NN'),
('in', 'IN'),
('one', 'CD'),
('’', 'NN'),
('s', 'NN'),
('life.It', 'NN'),
('is', 'VBZ'),
('both', 'DT'),
('exciting', 'VBG'),
('and', 'CC'),
('frightening', 'NN'),
('But', 'CC'),
('friendship', 'NN'),
('is', 'VBZ'),
('a', 'DT'),
('sacred', 'JJ'),
('bond', 'NN'),
('between', 'IN'),
('people.It', 'NN'),
('is', 'VBZ'),
('a', 'DT'),
('special', 'JJ'),
('kind', 'NN'),
('of', 'IN'),
('love', 'NN'),
('between', 'IN'),
('us', 'PRP'),
('Many', 'JJ'),
('of', 'IN'),
('you', 'PRP'),
('must', 'MD'),
('have', 'VB'),
('tried', 'VBN'),
('searching', 'VBG'),
('for', 'IN'),
('a', 'DT'),
('friend', 'NN'),
('but', 'CC'),
('never', 'RB'),
('found', 'VBD'),
('the', 'DT'),
('right', 'JJ'),
('one', 'NN')]
Basically, the goal of a POS tagger is to assign linguistic (mostly grammatical) information to sub-sentential units. Such units are called tokens and, most of the time, correspond to words and symbols (e.g. punctuation).
Similar Reads
Python | Tokenize text using TextBlob
Tokenization is a fundamental task in Natural Language Processing that breaks down a text into smaller units such as words or sentences which is used in tasks like text classification, sentiment analysis and named entity recognition. TextBlob is a python library for processing textual data and simpl
3 min read
Python | PoS Tagging and Lemmatization using spaCy
spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare text for deep learning. spaCy is much faster and accurate than NLTKTagger and TextBlob. How to Install ? pip install spa
2 min read
Text Searching in Google using Selenium in Python
Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. It is functional for all browsers, works on all major OS and its scripts are written in various languages i.e Python, Java, C#, etc, we will be working with Python. In this article, we are go
3 min read
Text Manipulation using OpenAI
Open AI is a leading organization in the field of Artificial Intelligence and Machine Learning, they have provided the developers with state-of-the-art innovations like ChatGPT, WhisperAI, DALL-E, and many more to work on the vast unstructured data available. For text manipulation, OpenAI has compil
10 min read
Speech To Text using IBM Watson Studio
IBM Watson Studio is an integrated environment designed to develop, train, manage models, and deploy AI-powered applications and is a Software as a Service (SaaS) solution delivered on the IBM Cloud. The IBM Cloud provides lots of services like Speech To Text, Text To Speech, Visual Recognition, Nat
2 min read
Processing text using NLP | Basics
In this article, we will be learning the steps followed to process the text data before using it to train the actual Machine Learning Model. Importing Libraries The following must be installed in the current working environment: NLTK Library: The NLTK library is a collection of libraries and program
2 min read
Python | Lemmatization with TextBlob
Lemmatization is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meanings to one word.Text preprocessing includes both Stemming a
2 min read
NLP | Customization Using Tagged Corpus Reader
How we can use Tagged Corpus Reader ?  Customizing word tokenizerCustomizing sentence tokenizerCustomizing paragraph block readerCustomizing tag separatorConverting tags to a universal tagset  Code #1 : Customizing word tokenizer  Python3 # Loading the libraries from nltk.tokenize import SpaceTok
2 min read
Speech Recognition in Python using Google Speech API
Speech recognition means converting spoken words into text. It used in various artificial intelligence applications such as home automation, speech to text, etc. In this article, youâll learn how to do basic speech recognition in Python using the Google Speech Recognition API.Step 1: Install Require
2 min read
Take input from user and store in .txt file in Python
In this article, we will see how to take input from users and store it in a .txt file in Python. To do this we will use python open() function to open any file and store data in the file, we put all the code in Python try-except block. Let's see the implementation below. Stepwise Implementation Ste
2 min read