0% found this document useful (0 votes)

49 views33 pages

NLP Lab - Manual

The document is a practical record for the Master of Science in Computer Science program at Vivekanandha College, focusing on Natural Language Processing (NLP) lab activities for the semester 2024-2026. It includes various NLP tasks such as tokenization, stemming, lemmatization, sentiment analysis, and data extraction, along with example programs and outputs. The document serves as a guide for students to document their practical work and is intended for submission during university practical examinations.

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views33 pages

NLP Lab - Manual

Uploaded by

Boomika G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

VIVEKANANDHA

COLLEGE OF ARTS AND SCIENCES FOR WOMEN

(AUTONOMOUS)
(AnISO9001:2015Certified Institution;
Affiliated to Periyar University, Salem, Approved by AICTE,
Re-accredited with“A++”Grade by NAAC, Recognized U/S 12(B), 2(f)ofUGCAct,1956)
Elayampalayam,Tiruchengode-637205.

MASTER OF SCIENCE IN COMPUTER SCIENCE

PRACTICAL RECORD

NAME :

REG.NO :

NATURAL LANGUAGE PROCESSING LAB

(24P2CSEP01)

SEMESTER–II

2024-2026
VIVEKANANDHA
COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(AUTONOMOUS)
AnISO9001:2015CertifiedInstitution
(Affiliated to Periyar University-Salem, Approved by AICTE,
Reaccredited with “A++”Grade by NAAC, RecognizedU/S12(B), 2(f)ofUGCAct1956)
Elayampalayam , Tiruchengode-637205.

MASTER OF SCIENCE IN COMPUTER SCIENCE

Certified that this is a bonafide record of practical work done by

Ms/Mrs Reg. No: in the NATURAL

LANGUAGE PROCESSING LAB (24P2CSEP01) at the Vivekanandha College of Arts and

Sciences for Women (Autonomous), Elayampalayam, Tiruchengode.

Staff In-Charge Head of the Department

Submitted for the University Practical Examinations held on at PG

and Research Department of Computer Science and Applications, Vivekanandha

College of Arts and Sciences for Women (Autonomous), Elayampalayam,

Tiruchengode.

Internal Examiner External Examiner

INDEX
S.NO DATE CONTENTS PAGE SIGN
NO.

1
Tokenize a given text

2 Sentences of a text document

Tokenize text with stop words as

3 delimiters

Remove stop words and

4 punctuations in a text

A. Perform Stemming
5 B. Lemmatize a given Text

6 Extract Usernames from Email

Common words in text excluding

7 stop words

8 Spell correction in a given text

Classify A Text as Positive/Negative

9 Sentiment

10 Root word of any word in a sentence

a) load the iris data from a given csv

11 file into a dataframe
b) Extract Noun and Verb phrases
from a text
sets of synonyms and antonyms of a
12 given word
Print the first 15 random combine
13 labeled male and labeled female
names from names corpus
PROGRAM

1. Tokenize a text

from nltk.tokenize import word_tokenize, sent_tokenize

import nltk

nltk.download('punkt') # Download tokenizer data

# Example text
text = "NLP makes machines understand language. Tokenization is the first step."

# Sentence Tokenization
print("Sentences:", sent_tokenize(text))

# Word Tokenization
print("Words:", word_tokenize(text))
OUTPUT
PROGRAM

2. Sentences of a text document

from nltk.tokenize import sent_tokenize

import nltk

nltk.download('punkt') # Download tokenizer data

# Read the text from a file

file_path = "example.txt" # Replace with your file path
with open(file_path, 'r') as file:
text = file.read()

# Sentence Tokenization
sentences = sent_tokenize(text)

# Display the sentences

print("Sentences in the document:")
for i, sentence in enumerate(sentences, 1):
print(f"{i}: {sentence}")
save a text file as example.txt in jupyter notebook

OUTPUT
PROGRAM

3. Tokenize text with stop words as delimiters

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords
import nltk
# Download necessary data
nltk.download('punkt')
nltk.download('stopwords')
# Example text
text = "I enjoy learning Python and coding."
# Define stop words
stop_words = set(stopwords.words('english'))
# Tokenize the text
words = word_tokenize(text)
# Tokenize using stop words as delimiters
tokens_without_stopwords = [word for word in words if word.lower() not in
stop_words]
# Output the result
print("Original Tokens:", words)
print("Tokens without Stop Words:", tokens_without_stopwords)
OUTPUT
PROGRAM

4. Remove stop words and punctuations in a text

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords
import string
import nltk

# Download necessary data

nltk.download('punkt')
nltk.download('stopwords')

# Example text
text = "Python is great! It's simple and powerful."

# Define stop words

stop_words = set(stopwords.words('english'))

# Tokenize the text

words = word_tokenize(text)

# Remove stop words and punctuation

tokens_cleaned = [word for word in words if word.lower() not in stop_words and
word not in string.punctuation]

# Output the result

print("Tokens without Stop Words and Punctuation:", tokens_cleaned)
OUTPUT
PROGRAM

5. A. Perform Stemming

# import these modules

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer()

# choose some words to be stemmed

words = ["pythonprogramming", "programs", "programmer", "event", "thankyou"]

for w in words:
print(w, " : ", ps.stem(w))
OUTPUT
PROGRAM

5. B. Lemmatize A Given Text

from nltk.tokenize import word_tokenize

from nltk.stem import WordNetLemmatizer
import nltk

# Download necessary resources

nltk.download('punkt')
nltk.download('wordnet')

def lemmatize_text(text):
lemmatizer = WordNetLemmatizer()
tokens = word_tokenize(text)
lemmatized_text = ' '.join([lemmatizer.lemmatize(word) for word in tokens])
return lemmatized_text

text = "The cats are chasing mice and playing in the garden"
lemmatized_text = lemmatize_text(text)

print("Original Text:", text)

print("Lemmatized Text:", lemmatized_text)
OUTPUT
PROGRAM
6. Extract Usernames from Email

# Using regular expression

import re
# Defining an email string
e = "[email protected]"
# Using the search function to find a match for the domain part of the email
match = re.search(r'@([a-zA-Z0-9.-]+)', e)
# If a match is found, extracting the domain part (the part after '@') using the group()
method
if match:
domain = match.group(1)
print(domain)
OUTPUT
PROGRAM
7. Find the most common words in the text excluding stop words

import nltk
from nltk.corpus import stopwords
from collections import Counter
import string
# Download stopwords (only needed once)
nltk.download("stopwords")

def most_common_words(text, n=10):

stop_words = set(stopwords.words("english")) # Load stop words
words = text.lower().translate(str.maketrans("", "", string.punctuation)).split() #
Convert to lowercase & remove punctuation
filtered_words = [word for word in words if word not in stop_words] # Remove stop
words
word_counts = Counter(filtered_words) # Count words
return word_counts.most_common(n) # Get most common words
# Example text
text = "This is a simple example text. This text is just for testing the most common
words."
# Get the top 5 most common words
result = most_common_words(text, 5)
# Print result
print(result)
OUTPUT
PROGRAM

8. Spell correction in a given text

# list of incorrect spellings

# that need to be corrected
incorrect_words=['happpy', 'azmaing', 'intelliengt']

# loop for finding correct spellings

# based on jaccard distance
# and printing the correct word
for word in incorrect_words:
temp = [(jaccard_distance(set(ngrams(word, 2)),
set(ngrams(w, 2))),w)
for w in correct_words if w[0]==word[0]]
print(sorted(temp, key = lambda val:val[0])[0][1])
PROGRAM

9. Classify A Text as Positive/Negative Sentiment

from textblob import TextBlob

text_1 = "The movie was so awesome." text_2
= "The food here tastes terrible."

#Determining the Polarity

p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity

#Determining the Subjectivity

s_1 = TextBlob(text_1).sentiment.subjectivity s_2
= TextBlob(text_2).sentiment.subjectivity

print("Polarity of Text 1 is", p_1)

print("Polarity of Text 2 is", p_2)
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_2)
PROGRAM

10. Find the ROOT word of any word in a sentence

from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()
words = ["renting", "renter", "rental", "rents", "apple"]
all_rents = {}
for word in words:
stem = stemmer.stem(word)
if stem not in all_rents:
all_rents[stem] = []
all_rents[stem].appen
d(word) else:
all_rents[stem].append(word)
print(all_rents)
OUTPUT

{'rent': ['renting', 'rents'], 'renter': ['renter'], 'rental': ['rental'], 'appl': ['apple']}

PROGRAM

11. a) load the iris data from a given csv file into a dataframe and print
the shape of the data, type of the data and first 3 rows.

import pandas as pd
data =
pd.read_csv("iris.csv")
print("Shape of the
data:") print(data.shape)
print("\nData Type:")
print(type(data))
print("\nFirst 3 rows:")
print(data.head(3))
OUTPUT
Shape of the data:
(150, 6)

Data Type:
<class 'pandas.core.frame.DataFrame'>

First 3 rows:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
PROGRAM

11.b) Extract Noun and Verb phrases from a text

import nltk

from nltk.tokenize import word_tokenize

from nltk import pos_tag, RegexpParser

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

def chunk_sentence(sentence):

words = word_tokenize(sentence) # Tokenize words

tagged_words = pos_tag(words) # Perform POS tagging

# Define grammar for chunking

grammar = r"""

NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN

PP: {<IN><NP>} # Chunk prepositions followed by NP

VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments

CLAUSE: {<NP><VP>} # Chunk NP, VP pair """

parser = RegexpParser(grammar) # Create a chunk parser

chunked_sentence = parser.parse(tagged_words) # Apply parsing

return chunked_sentence

# Example sentence

sentence = "The quick brown fox jumps over the lazy dog"

# Perform chunking

chunked_sentence = chunk_sentence(sentence)

# Print chunked result

print(chunked_sentence)

# Optional: Draw chunk tree (Only works in GUI-supported environments)

chunked_sentence.draw()
OUTPUT
PROGRAM

12. Write a Python NLTK program to find the sets of synonyms and
antonyms of a given word.

def synonym_antonym_extractor(phrase):

from nltk.corpus import wordnet

synonyms = []

antonyms = []

for syn in wordnet.synsets(phrase):

for l in syn.lemmas():

synonyms.append(l.name())

if l.antonyms():

antonyms.append(l.antonyms()[0].name())

print(set(synonyms))

print(set(antonyms))

synonym_antonym_extractor(phrase="word")
OUTPUT
PROGRAM

13. Print the first 15 random combine labeled male and labeled female
names from names corpus.

from nltk.corpus import

names import random
male_names =
names.words('male.txt')
female_names =
names.words('female.txt')
labeled_male_names = [(str(name), 'male') for name in
male_names] labeled_female_names = [(str(name), 'female') for
name in female_names] # combine labeled male and labeled female
names
labeled_all_names = labeled_male_names +
labeled_female_names # shuffle the labeled names array
random.shuffle(labeled_all_names)

print("First 15 random labeled combined

names:") print (labeled_all_names[:15])
OUTPUT

NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Day1
No ratings yet
NLP Day1
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Batch 2
No ratings yet
Batch 2
13 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
NLP Text Preprocessing Techniques
No ratings yet
NLP Text Preprocessing Techniques
15 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
Ass 3
No ratings yet
Ass 3
3 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Tasks for MCA Students
No ratings yet
NLP Tasks for MCA Students
16 pages
Bling
No ratings yet
Bling
7 pages
Python NLP Practical Exercises
No ratings yet
Python NLP Practical Exercises
14 pages
NLTK Tutorial: Basics and Techniques
No ratings yet
NLTK Tutorial: Basics and Techniques
33 pages
Python Text Processing Techniques
No ratings yet
Python Text Processing Techniques
13 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP
No ratings yet
NLP
12 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP Record
No ratings yet
NLP Record
15 pages
Lab 2
No ratings yet
Lab 2
49 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Natural Langauage Processing (NLP) : Tokenization of Words
No ratings yet
Natural Langauage Processing (NLP) : Tokenization of Words
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Module 5
No ratings yet
Module 5
69 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
NLP Techniques in Machine Learning Lab
No ratings yet
NLP Techniques in Machine Learning Lab
4 pages
AI Practical No 9-13
No ratings yet
AI Practical No 9-13
5 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
7 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Milk Billing System Documentation
No ratings yet
Milk Billing System Documentation
1 page
Java Stack and Queue Implementation
No ratings yet
Java Stack and Queue Implementation
7 pages
Student Bus Count for June 2024
No ratings yet
Student Bus Count for June 2024
1 page
A12 Route Visit Bus Report
No ratings yet
A12 Route Visit Bus Report
2 pages
Journal 1
No ratings yet
Journal 1
9 pages
ID3 Decision Tree
No ratings yet
ID3 Decision Tree
5 pages
Complete ID3 Decision Tree
No ratings yet
Complete ID3 Decision Tree
15 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
Understanding Identifiers in C Programming
No ratings yet
Understanding Identifiers in C Programming
93 pages
Workshop 02
No ratings yet
Workshop 02
22 pages
English Fal p1 Nov 2019 Memo
No ratings yet
English Fal p1 Nov 2019 Memo
11 pages
AssamResult - in - No 1 Result Provider Website From Assam
No ratings yet
AssamResult - in - No 1 Result Provider Website From Assam
3 pages
DB4020 English Manual
No ratings yet
DB4020 English Manual
40 pages
UI Design Lab Manual for Flutter
No ratings yet
UI Design Lab Manual for Flutter
60 pages
FFT Spectral Analysis
No ratings yet
FFT Spectral Analysis
69 pages
Language For Recipes
No ratings yet
Language For Recipes
7 pages
3BSE038018-600 - en System 800xa 6.0 System Guide Functional Description
No ratings yet
3BSE038018-600 - en System 800xa 6.0 System Guide Functional Description
588 pages
Wordpress Interview Questions-n-Answers
No ratings yet
Wordpress Interview Questions-n-Answers
7 pages
The Something
No ratings yet
The Something
2 pages
ISRO's CARTOSAT-3 Launch Success
No ratings yet
ISRO's CARTOSAT-3 Launch Success
263 pages
Angliski Jazik
No ratings yet
Angliski Jazik
53 pages
Ministry of Evangelism
No ratings yet
Ministry of Evangelism
103 pages
Image Denoising From Classical To Sota Approaches
No ratings yet
Image Denoising From Classical To Sota Approaches
25 pages
Download
No ratings yet
Download
205 pages
ICC Banking Document Corrections Guide
No ratings yet
ICC Banking Document Corrections Guide
1 page
Art Direction For The Web - Andy Clarke
No ratings yet
Art Direction For The Web - Andy Clarke
345 pages
June 2015 - 7th PreTest
No ratings yet
June 2015 - 7th PreTest
8 pages
Apple Script X
No ratings yet
Apple Script X
40 pages
Jose M Escriba Albas (Opus Dei) y Marcial Maciel Degollado (Legionarios de Cristo)
No ratings yet
Jose M Escriba Albas (Opus Dei) y Marcial Maciel Degollado (Legionarios de Cristo)
40 pages
CISC 101: Intro to Computer Science
No ratings yet
CISC 101: Intro to Computer Science
20 pages
Chapter 6 - Query Processing and Optimization Algorithm
No ratings yet
Chapter 6 - Query Processing and Optimization Algorithm
27 pages
Java Lab Manual New
No ratings yet
Java Lab Manual New
55 pages
Course Catalog
No ratings yet
Course Catalog
9 pages
Data Types in Visual Basic 6
100% (1)
Data Types in Visual Basic 6
2 pages
Types of Claims in Grade 11 Reading
No ratings yet
Types of Claims in Grade 11 Reading
32 pages
TOEFL Answer Sheet 2020
No ratings yet
TOEFL Answer Sheet 2020
1 page
Unit 3 Part 2
No ratings yet
Unit 3 Part 2
48 pages
Class - 7 Ut - Ii Paper
No ratings yet
Class - 7 Ut - Ii Paper
2 pages
6 Worksheets
No ratings yet
6 Worksheets
4 pages