Unit2 Full

Uploaded by

rajlakshmipriya21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Unit2 Full

Uploaded by

rajlakshmipriya21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Unit-2

Unstructured Text Analysis

&
Chatbot Development
Unstructured Text Analysis
TextBlob is a Python library for processing textual data. It provides a simple API for diving into common
natural language processing (NLP) tasks, such as part-of-speech tagging, noun phrase extraction, sentiment
analysis, classification, translation, and more.
To install TextBlob, you can use pip:
pip install textblob
You will also need to download the necessary NLTK corpora:
python -m textblob.download_corpora
functionalities provided by TextBlob:
1. Creating a TextBlob Object:
from textblob import TextBlob
text = "TextBlob is amazingly simple to use. What great fun!"
blob = TextBlob(text)
2. Part-of-Speech Tagging:
print(blob.tags) # [('TextBlob', 'NNP'), ('is', 'VBZ'), ('amazingly', 'RB'), ...]
3. Noun Phrase Extraction:
print(blob.noun_phrases) # WordList(['textblob', 'great fun'])
4. Sentiment Analysis:
TextBlob's sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The polarity score is a float within
the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
print(blob.sentiment)
# Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)
5. Tokenization
You can break TextBlob into sentences or words:
print(blob.sentences)
# [Sentence("TextBlob is amazingly simple to use."), Sentence("What great fun!")]
print(blob.words)
# WordList(['TextBlob', 'is', 'amazingly', 'simple', 'to', 'use', 'What', 'great', 'fun'])
6. Word Inflection and Lemmatization
from textblob import Word
w = Word("running")
print(w.lemmatize("v")) # 'run'
7. Spelling Correction
pythonCopy codeblob = TextBlob("I havv goood speling!")
print(blob.correct()) # 'I have good spelling!'
8. Translation and Language Detection
You can use TextBlob to translate text between languages and detect the language of a text.
blob = TextBlob("Simple is better than complex.")
print(blob.translate(to="es")) # 'Simple es mejor que complejo.'
9. WordNet Integration
TextBlob integrates with WordNet, a lexical database for the English language, which can be used for synonyms and antonyms.
from textblob.wordnet import Synset
syn = Synset('ship.n.01')
print(syn.hypernyms()) # [Synset('vessel.n.02')]
Example: Sentiment Analysis
Here is a full example demonstrating how to use TextBlob for sentiment analysis:

from textblob import TextBlob

text = "I love this library. It's so simple to use! However, sometimes it can be a bit slow."
blob = TextBlob(text)

# Analyze sentiment
for sentence in blob.sentences:
print(f"Sentence: {sentence}")
print(f"Sentiment: {sentence.sentiment}")

Sentence: I love this library.

Sentiment: Sentiment(polarity=0.5, subjectivity=0.6)
Sentence: It's so simple to use!
Sentiment: Sentiment(polarity=0.375, subjectivity=0.75)
Sentence: However, sometimes it can be a bit slow.
Sentiment: Sentiment(polarity=-0.15000000000000002, subjectivity=0.5333333333333333)
Text Classification using Naive Bayes
Naive Bayes is a simple yet powerful classification algorithm based on Bayes' theorem. It is particularly effective for text
classification tasks such as spam detection and sentiment analysis. Effective for text classification, using scikit-learn's
MultinomialNB.
Here's a step-by-step guide to implementing Naive Bayes for text classification using Python and the scikit-learn library:

Step 1: Import Libraries

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Step 2: Load Dataset
For demonstration purposes, let's use a simple dataset. You can replace this with any text dataset you have.
data = {
'text': ["I love this movie", "I hate this movie", "This was an amazing experience", "This was a terrible experience"],
'label': ["positive", "negative", "positive", "negative"]
}
df = pd.DataFrame(data)
Step 3: Preprocess Data
Convert text data to feature vectors.
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['text'])
y = df['label']
Step 4: Split Data
Split the dataset into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train Naive Bayes Classifier
classifier = MultinomialNB()
classifier.fit(X_train, y_train)
Step 6: Make Predictions and Evaluate
y_pred = classifier.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
Noun Phrase Extraction
Noun phrase extraction involves identifying noun phrases in text, which can be useful for various NLP tasks. We can use
libraries like NLTK or spaCy for this purpose. Identifying noun phrases using NLTK's chunking or spaCy's
noun_chunks.Here’s how to do it with both:
Using NLTK
import nltk
from nltk import word_tokenize, pos_tag
from nltk.chunk import RegexpParser
# Sample text
text = "Natural language processing is a field of artificial intelligence."
# Tokenize and POS tagging
tokens = word_tokenize(text)
tagged = pos_tag(tokens)
# Define a chunk grammar
grammar = "NP: {<DT>?<JJ>*<NN>}"
# Create a chunk parser
chunk_parser = RegexpParser(grammar)
# Parse the text
tree = chunk_parser.parse(tagged)
# Extract noun phrases
for subtree in tree.subtrees():
if subtree.label() == 'NP':
print(' '.join(word for word, tag in subtree.leaves()))
Using spaCy
import spacy
# Load spaCy model
nlp = spacy.load('en_core_web_sm')
# Sample text
text = "Natural language processing is a field of artificial intelligence."
# Process text with spaCy
doc = nlp(text)
# Extract noun phrases
for np in doc.noun_chunks:
print(np.text)
TextBlob for DataCleaning & Tokenization, etc.
Data cleaning is a crucial step in preparing text data for analysis and involves several tasks such as removing noise, correcting
errors, and standardizing text. TextBlob provides simple and effective tools for some of these tasks. Here are some common
data cleaning tasks using TextBlob and other Python libraries:
1. Lowercasing
Convert all text to lowercase to ensure consistency.
from textblob import TextBlob
text = "This is an Example of Text with Mixed CASE."
blob = TextBlob(text.lower())
print(blob)
2. Removing Punctuation
Remove punctuation to focus on the words.
import string
text = "This is an example, with punctuation!"
blob = TextBlob(text)
cleaned_text = ''.join([char for char in blob if char not in string.punctuation])
print(cleaned_text)
3. Correcting Spelling
Correct spelling mistakes.
text = "I havv a speling mistakke."
blob = TextBlob(text)
corrected_text = blob.correct()
4. Removing Stopwords
Remove common stopwords that do not contribute much meaning.
pythonCopy codefrom textblob import TextBlob
from nltk.corpus import stopwords
# Ensure you have downloaded the stopwords corpus
import nltk
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
text = "This is a sample sentence, showing off the stop words filtration."
blob = TextBlob(text)
filtered_words = [word for word in blob.words if word.lower() not in stop_words]
print(' '.join(filtered_words))

5. Tokenization
Split text into words and sentences.
blob = TextBlob("TextBlob is a great tool. It makes NLP tasks simple.")
words = blob.words
sentences = blob.sentences
print("Words:", words)
print("Sentences:", sentences)

6. Lemmatization
Reduce words to their base or root form.
from textblob import Word
words = ["running", "jumps", "easily", "fairly"]
lemmatized_words = [Word(word).lemmatize() for word in words]
print(lemmatized_words)
7. Removing Non-Alphanumeric Characters
Remove characters that are not letters or numbers.
import re
text = "This is a sample sentence with numbers 123 and symbols #!@."
blob = TextBlob(text)
cleaned_text = re.sub(r'\W+', ' ', blob.raw)
print(cleaned_text)

8. Removing Extra Whitespace

Remove extra spaces and newlines.
text = "This is a sample text with extra spaces."
blob = TextBlob(text)
cleaned_text = ' '.join(blob.words)
print(cleaned_text)

9. Stemming
Reduce words to their stem or root form (less sophisticated than lemmatization).
from nltk.stem import PorterStemmer
ps = PorterStemmer()
words = ["running", "jumps", "easily", "fairly"]
stemmed_words = [ps.stem(word) for word in words]
print(stemmed_words)
Combining Data Cleaning Steps
Combining multiple data cleaning steps into a single process.
pythonCopy codefrom textblob import TextBlob
from nltk.corpus import stopwords
import string
import re
import nltk
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
def clean_text(text):
# Lowercasing
text = text.lower()
# Removing punctuation
text = ''.join([char for char in text if char not in string.punctuation])
# Removing stopwords
words = TextBlob(text).words
filtered_words = [word for word in words if word not in stop_words]
# Removing non-alphanumeric characters
cleaned_text = re.sub(r'\W+', ' ', ' '.join(filtered_words))
# Removing extra whitespace
cleaned_text = ' '.join(cleaned_text.split())
return cleaned_text
text = "This is a sample TEXT with punctuation, numbers 123, and stopwords!"
cleaned_text = clean_text(text)
TextBlob is a versatile tool for a variety of NLP tasks:

Data Cleaning: Correct spelling in multiple sentences.

Tokenization: Tokenize complex text into words and sentences.

POS Tagging: Tag words in a complex sentence with their parts of speech.

Noun Phrase Extraction: Extract noun phrases from a paragraph.

Sentiment Analysis: Analyze the sentiment of multiple sentences.

Translation and Language Detection: Translate and detect the language of multiple texts.

Text Classification: Train and use a classifier with more extensive data.

Basic NLP Tasks: Pluralize and singularize words.
Introduction to Transformers
Transformers are a type of model architecture introduced in the paper "Attention is All You Need" by Vaswani et al. in
2017. They have revolutionized the field of natural language processing (NLP) by enabling highly efficient training and
achieving state-of-the-art results in various tasks. Transformers rely on a mechanism called self-attention, which allows
them to consider the entire input sequence when making predictions.Transformers have transformed NLP with their self-
attention mechanism, allowing models to capture context and dependencies more effectively than previous architectures.
The Hugging Face Transformers library provides an easy-to-use interface for leveraging these powerful models in various
NLP tasks, from text classification to machine translation and beyond. By fine-tuning pre-trained models on specific
datasets, users can achieve state-of-the-art performance.

Hugging Face Transformers is a popular open-source library that provides easy access to a vast array of pre-trained
models for natural language processing (NLP) tasks. It allows you to quickly use these models for inference or fine-tune
them on your own datasets for specific tasks, like text classification, question answering, or text generation. The library
supports various transformer-based models, including BERT, GPT, RoBERTa, and many others, making it a versatile tool
for NLP practitioners.
Key Concepts
Self-Attention: The self-attention mechanism enables the model to weigh the importance of different words in a sentence when
encoding a word. This allows the model to capture context and dependencies between words, regardless of their distance in the
sequence.
Positional Encoding: Since Transformers do not have a built-in notion of the order of words (unlike RNNs or LSTMs), positional
encodings are added to the input embeddings to provide information about the position of each word in the sequence.
Multi-Head Attention: Multi-head attention allows the model to focus on different parts of the sentence simultaneously,
improving its ability to capture various aspects of the context.
Feed-Forward Networks: Each position in the sequence is processed independently by a feed-forward neural network, adding
non-linearity and complexity to the model.
Layer Normalization: Normalization layers are used to stabilize and speed up training by maintaining the mean and variance of
the activations.
Encoder-Decoder Architecture: The original Transformer architecture consists of an encoder and a decoder, making it suitable
for sequence-to-sequence tasks such as translation. The encoder processes the input sequence, and the decoder generates the output
sequence, attending to the encoder's output.
Practical Implementation
The Hugging Face Transformers library provides a comprehensive implementation of various Transformer models, making it easy to use
them for different NLP tasks. Here's an introduction to using the library:
Installation
Install the transformers library:
pip install transformers
Loading a Pre-trained Model
Here’s how to load a pre-trained BERT model and tokenizer for a simple text classification task:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load pre-trained model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
# Example text
text = "Transformers are a groundbreaking innovation in NLP."
# Tokenize the input text
inputs = tokenizer(text, return_tensors='pt', max_length=512, truncation=True, padding='max_length')
# Make predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
print(f"Predicted class: {predictions.item()}")
Fine-Tuning a Pre-trained Model
Fine-tuning a pre-trained Transformer model on your specific dataset involves training the model on labeled examples. Here’s a basic example of how to fine-
tune BERT on a text classification dataset:
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# Load dataset (example: IMDB)
dataset = load_dataset('imdb')
train_dataset = dataset['train'].map(lambda e: tokenizer(e['text'], truncation=True, padding='max_length'), batched=True)
test_dataset = dataset['test'].map(lambda e: tokenizer(e['text'], truncation=True, padding='max_length'), batched=True)
# Define training arguments
training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
weight_decay=0.01,
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
# Train the model
trainer.train()
Common Transformer Models
BERT (Bidirectional Encoder Representations from Transformers): Designed for pre-training deep bidirectional
representations by jointly conditioning on both left and right context in all layers.
GPT (Generative Pre-trained Transformer): An autoregressive model designed for generating text and fine-tuning on
various downstream tasks.
T5 (Text-To-Text Transfer Transformer): Converts all NLP tasks into a text-to-text format, simplifying the input-
output interface.
RoBERTa (Robustly Optimized BERT Approach): An optimized version of BERT with improved training strategies.
Applications of Transformers
Text Classification: Sentiment analysis, spam detection, etc.
Named Entity Recognition (NER): Identifying entities like names, dates, and locations in text.
Machine Translation: Translating text from one language to another.
Text Generation: Generating coherent and contextually relevant text.
Question Answering: Answering questions based on context from a given passage.
Summarization: Creating concise summaries of long documents.
DistilBERT

DistilBERT, a smaller and faster version of BERT, is widely used for various natural language processing (NLP)
tasks, including text classification and sentiment analysis. Below, I'll provide an overview and examples of how to
use DistilBERT for these tasks using the Hugging Face Transformers library.
Text Classification with DistilBERT
Text classification involves categorizing text into predefined labels. Here's how you can use DistilBERT for this task:
1. Installation
Install the necessary libraries:
pip install transformers datasets
2. Loading the Model and Tokenizer
Load the DistilBERT model and tokenizer from Hugging Face:
from transformers import DistilBERTTokenizer, DistilBERTForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
# Load the tokenizer and model
tokenizer = DistilBERTTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBERTForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)
# Load your dataset
dataset = load_dataset('csv', data_files='path/to/your/dataset.csv')
3. Tokenize the Dataset
Tokenize the dataset to prepare it for training:
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
4. Define Training Arguments and Trainer
Set up the training arguments and the Trainer:
training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy='epoch', 
DistilBERT is a versatile model that can
per_device_train_batch_size=8,
be effectively used for text classification
per_device_eval_batch_size=8,
num_train_epochs=3, and sentiment analysis tasks. Whether
weight_decay=0.01, you use pre-trained models directly or
) fine-tune them on your datasets, the
trainer = Trainer( Hugging Face Transformers library
model=model, provides robust tools to streamline the
args=training_args,
process.
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
)

5. Train the Model

Train the model:
trainer.train()
Sentiment Analysis with DistilBERT
Sentiment analysis involves determining the sentiment expressed in a text (e.g., positive, negative, neutral). You can use a pre-trained
DistilBERT model fine-tuned on a sentiment analysis dataset like SST-2.
1. Using a Pre-trained Pipeline
For quick sentiment analysis, you can use the pre-trained pipeline:
from transformers import pipeline
# Load pre-trained DistilBERT sentiment analysis pipeline
classifier = pipeline('sentiment-analysis')
# Sample text
text = "I love using Hugging Face's Transformers library!"
# Perform sentiment analysis
result = classifier(text)
print(result)
2. Fine-Tuning DistilBERT for Sentiment Analysis
If you need a custom sentiment analysis model, fine-tuning might be necessary:
from transformers import DistilBERTTokenizer, DistilBERTForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
# Load the tokenizer and model
tokenizer = DistilBERTTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBERTForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)
# Load your dataset (e.g., from a CSV file)
dataset = load_dataset('csv', data_files='path/to/your/sentiment_dataset.csv')
# Tokenize the dataset
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Define training arguments
training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy='epoch',
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
weight_decay=0.01,
)
# Define Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
)
# Train the model
trainer.train()
Hugging Face Transformers library:Hugging Face Transformers is a powerful library for working with
transformer-based models in NLP, offering a wide range of models, easy-to-use API, and extensive
community support.
Model Support: Hugging Face Transformers supports a wide range of transformer-based models, including BERT, GPT, RoBERTa,
DistilBERT, and many others. These models can be used for various NLP tasks such as text classification, sequence labeling, text
generation, and more.

Model Hub: The library provides a model hub (https://huggingface.co/models) where you can discover and download pre-trained models
and tokenizer files for your specific task. This makes it easy to access state-of-the-art models and use them in your projects.

Fine-Tuning: One of the key features of Hugging Face Transformers is its support for fine-tuning pre-trained models on custom datasets.
This allows you to adapt a pre-trained model to perform well on your specific task or domain.

Easy-to-Use API: The library provides a simple and intuitive API for working with pre-trained models. You can easily load a model,
tokenize text input, and perform inference using just a few lines of code.

Community and Resources: Hugging Face has a large and active community of developers working on NLP projects. They provide
extensive documentation, tutorials, and example code to help you get started with the library.

Solutions Manual-Dynamics of Structures (R.w. Clough and J. Penzien) - Francisco Medina-1995 118p
92% (50)
Solutions Manual-Dynamics of Structures (R.w. Clough and J. Penzien) - Francisco Medina-1995 118p
118 pages
Glove
100% (1)
Glove
10 pages
What is TextBlob
No ratings yet
What is TextBlob
10 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
No ratings yet
APznzaaezhN_zrfGNBIVQoFpyxQuDJEbpYM-rd1_4RK0dsKNoyaIK1leg5AOwJTuo35Fm7my_JrMLHTTwQc2-C9HancQl3eg5PMXqg3GVh...P8BhsI_jQJy5fp8rf5U6yKHXRfFB-0sfyXvsKcrtjBjLcU1flNWbsLeC886utDYCdlHaYbVGoX44N_s9IQDFZVmSS9erIHdWuLbw1xo7dFCD-1IOTfC4GfUw8x
171 pages
NLP_Preprocessing_Steps__1740444240
No ratings yet
NLP_Preprocessing_Steps__1740444240
20 pages
Text Mining and Dataset Creation in Python
No ratings yet
Text Mining and Dataset Creation in Python
13 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
TP NLP
No ratings yet
TP NLP
42 pages
Understanding Language Model
No ratings yet
Understanding Language Model
5 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
2 NLP Pipeline
No ratings yet
2 NLP Pipeline
57 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
NLP_EXP2 (1)
No ratings yet
NLP_EXP2 (1)
6 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
British_Airways_Forage_Report
No ratings yet
British_Airways_Forage_Report
12 pages
Handling Corpus Raw Text
No ratings yet
Handling Corpus Raw Text
15 pages
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
No ratings yet
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
20 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
NLP
No ratings yet
NLP
81 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
NLP_Assignment2 proper RNN working
No ratings yet
NLP_Assignment2 proper RNN working
3 pages
Python and NLP Notes
No ratings yet
Python and NLP Notes
32 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Module2.4 Text Processing
No ratings yet
Module2.4 Text Processing
17 pages
IR Pract
No ratings yet
IR Pract
7 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Record
No ratings yet
NLP Record
15 pages
6 - Text Vectorization-CSC688-SP22
No ratings yet
6 - Text Vectorization-CSC688-SP22
5 pages
NLP Assignment(917722H031)
No ratings yet
NLP Assignment(917722H031)
18 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
NLP tech-names
No ratings yet
NLP tech-names
3 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Methodology (Autosaved)
No ratings yet
Methodology (Autosaved)
9 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
NLP_Assignment2
No ratings yet
NLP_Assignment2
7 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
IR....
No ratings yet
IR....
5 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
21 pages
Practical Lect1 - NLP in Python by TextBlob
No ratings yet
Practical Lect1 - NLP in Python by TextBlob
6 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Probability, Surveyng, Transpotation Engring
No ratings yet
Probability, Surveyng, Transpotation Engring
13 pages
Design Optimization of A Connecting Rod For Internal Combustion Engine (041-060)
No ratings yet
Design Optimization of A Connecting Rod For Internal Combustion Engine (041-060)
20 pages
Introduction To Python 2
No ratings yet
Introduction To Python 2
8 pages
Computer Programming
No ratings yet
Computer Programming
21 pages
III CS - Software Project Management - Model Question paper
No ratings yet
III CS - Software Project Management - Model Question paper
2 pages
Abstrak Bahasa Inggris
No ratings yet
Abstrak Bahasa Inggris
2 pages
Reverse Engineering of PDC Drill Bit Design To Study Improvement On Rate of Penetration
No ratings yet
Reverse Engineering of PDC Drill Bit Design To Study Improvement On Rate of Penetration
4 pages
Computer Accessibility
No ratings yet
Computer Accessibility
23 pages
Vlsi Testing & DFT
No ratings yet
Vlsi Testing & DFT
115 pages
Ich Validation 2qa
No ratings yet
Ich Validation 2qa
9 pages
J M 6 L Y 1 0 3 1 0 0 1 2 3 4 5 6: Vehicle Identification Number (Vin) Code
No ratings yet
J M 6 L Y 1 0 3 1 0 0 1 2 3 4 5 6: Vehicle Identification Number (Vin) Code
81 pages
Final Date Sheet for Re-Appear Examinations of UG Courses (1st Semester Re-Appear), Bachelor of Shastri (1st 3rd & 5th Semester Re-Appear), B.a.B.ed, B.sc.B.ed and Master of Education (1st Semester Regualr & Reappear)
No ratings yet
Final Date Sheet for Re-Appear Examinations of UG Courses (1st Semester Re-Appear), Bachelor of Shastri (1st 3rd & 5th Semester Re-Appear), B.a.B.ed, B.sc.B.ed and Master of Education (1st Semester Regualr & Reappear)
5 pages
HW 3
No ratings yet
HW 3
8 pages
Fault Prediction of Transformer Using Machine Learning and DGA
No ratings yet
Fault Prediction of Transformer Using Machine Learning and DGA
5 pages
RESEARCH PAPER IMRAD (Edited)
No ratings yet
RESEARCH PAPER IMRAD (Edited)
14 pages
AVHDL
No ratings yet
AVHDL
183 pages
Ielts Mock Test 2020 June Reading Practice Test 1 v9 2787419
No ratings yet
Ielts Mock Test 2020 June Reading Practice Test 1 v9 2787419
34 pages
Addition of Polynomials
No ratings yet
Addition of Polynomials
24 pages
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
16 pages
May Cat 110kv
No ratings yet
May Cat 110kv
7 pages
Figure of Speech
No ratings yet
Figure of Speech
3 pages
Bomba Peristaltica
No ratings yet
Bomba Peristaltica
2 pages
Grade 5 PPT - Math - Q2 - W5 - Day 1
100% (2)
Grade 5 PPT - Math - Q2 - W5 - Day 1
13 pages
Enable Greater Data Reduction, Storage Performance, and Manageability With Dell EMC PowerStore Storage Arrays
100% (1)
Enable Greater Data Reduction, Storage Performance, and Manageability With Dell EMC PowerStore Storage Arrays
12 pages
Eee-Vii-Industrial Drives and Applications (10ee74) - Notes
No ratings yet
Eee-Vii-Industrial Drives and Applications (10ee74) - Notes
90 pages
Adobe Livecycle Designer 9.0 Installation Prerequisite
No ratings yet
Adobe Livecycle Designer 9.0 Installation Prerequisite
4 pages
Inequalities Muirhead
No ratings yet
Inequalities Muirhead
4 pages
The Mathematical Theory of Tone Systems 1st Edition Jan Haluska (Author)instant download
100% (1)
The Mathematical Theory of Tone Systems 1st Edition Jan Haluska (Author)instant download
43 pages
40 Days Challenge - CAT King
No ratings yet
40 Days Challenge - CAT King
15 pages

Unit2 Full

Uploaded by

Unit2 Full

Uploaded by

Unit-2

Unstructured Text Analysis

from textblob import TextBlob

Sentence: I love this library.

Step 1: Import Libraries

8. Removing Extra Whitespace

5. Train the Model

You might also like