0% found this document useful (0 votes)

5 views4 pages

NLP Day1

This is a nlp topic helps the students

Uploaded by

Shubham thakran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

NLP Day1

This is a nlp topic helps the students

Uploaded by

Shubham thakran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lab Sheet 1: Introduction to Python and

Text Handling in NLP

Natural Language Processing Lab

School of Engineering and Technology,

K.R. Mangalam University
August 6, 2025

Objectives
• To understand the basics of Python programming for text processing.

• To learn reading, writing, and manipulating text files.

• To perform basic string operations and preprocessing.

• To tokenize text into sentences and words.

• To calculate word frequencies and export results.

Learning Outcomes
By the end of this lab, students will be able to:

1. Read and write text files in Python.

2. Perform basic text cleaning and string operations.

3. Tokenize text using Python and NLTK.

4. Calculate and visualize word frequencies.

5. Export processed results to CSV.

Task 1: Reading and Writing Text Files

Objective: Learn to load a text file and save modified output.

1
1 # Task 1: Reading & Writing Text Files
2

3 # Read file
4 with open ( " sample_text . txt " , " r " , encoding = " utf -8 " ) as f :
5 text_data = f . read ()
6

7 print ( " Original Text :\ n " , text_data )

9 # Write file ( uppercase version )

10 with open ( " output_text . txt " , " w " , encoding = " utf -8 " ) as f :
11 f . write ( text_data . upper () )
12

13 print ( " \ nUppercase version saved to ’ output_text . txt ’" )

Activity: Read a paragraph from a file, convert it to lowercase, and save to a new
file.

Task 2: Basic String Operations

Objective: Practice text cleaning.
1 # Task 2: Basic String Operations
2

3 import string
4

5 text = " Natural Language Processing ( NLP ) is fun !!! "

7 # Remove leading / trailing spaces

8 clean_text = text . strip ()
9

10 # Lowercase
11 clean_text = clean_text . lower ()
12

13 # Remove punctuation
14 clean_text = clean_text . translate ( str . maketrans ( " " , " " , string .
punctuation ) )
15

16 print ( " Cleaned Text : " , clean_text )

Activity: Remove digits and replace multiple spaces with a single space.

Task 3: Tokenization
Objective: Split text into sentences and words using NLTK.

2
1 # Task 3: Tokenization
2 import nltk
3 nltk . download ( " punkt " )
4

5 sample = " NLP is amazing . It helps computers understand human language .

"
6

7 # Sentence tokenization
8 sentences = nltk . sent_tokenize ( sample )
9 print ( " Sentences : " , sentences )
10

11 # Word tokenization
12 words = nltk . word_tokenize ( sample )
13 print ( " Words : " , words )

Activity: Tokenize your paragraph and count the number of sentences and words.

Task 4: Word Frequency Analysis

Objective: Count the most common words.
1 # Task 4: Word Frequency Count
2 from collections import Counter
3

4 # Lowercase and tokenize

5 tokens = nltk . word_tokenize ( sample . lower () )
6

7 # Remove punctuation
8 tokens = [ t for t in tokens if t . isalpha () ]
9

10 # Frequency count
11 freq = Counter ( tokens )
12 print ( " Word Frequency : " , freq )
13 print ( " \ nMost Common : " , freq . most_common (5) )

Activity: Remove stopwords using:

1 from nltk . corpus import stopwords
2 nltk . download ( " stopwords " )
3 stop_words = set ( stopwords . words ( " english " ) )
4 tokens = [ t for t in tokens if t not in stop_words ]

Task 5: Exporting Results

Objective: Save word frequency to CSV.

3
1 import pandas as pd
2

3 # Convert frequency dictionary to DataFrame

4 df = pd . DataFrame ( freq . items () , columns =[ " Word " , " Frequency " ])
5

6 # Save to CSV
7 df . to_csv ( " word_frequency . csv " , index = False )
8 print ( " Word frequency saved to ’ word_frequency . csv ’" )

Expected Deliverables
• A text file containing original text.

• A modified text file (uppercase/lowercase).

• A CSV file containing word frequencies.

• Screenshot of console output for each task.

NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
22 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
No ratings yet
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
5 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Lab Manual - NLP
No ratings yet
Lab Manual - NLP
60 pages
Ass 3
No ratings yet
Ass 3
3 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
CCS369-Text and Speech Analysis Lab (1-9)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9)
37 pages
Batch 2
No ratings yet
Batch 2
13 pages
Final Summary NLP
No ratings yet
Final Summary NLP
446 pages
CS Practicals Xii 2022 23
No ratings yet
CS Practicals Xii 2022 23
26 pages
TSA Student
No ratings yet
TSA Student
20 pages
NLP PRGRM-1
No ratings yet
NLP PRGRM-1
7 pages
Python NLP
No ratings yet
Python NLP
15 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Task 1
No ratings yet
Task 1
4 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
File Handling
No ratings yet
File Handling
23 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Text Processing
No ratings yet
Text Processing
16 pages
NLP Practical Journal
No ratings yet
NLP Practical Journal
36 pages
Aim - Procedure - Result - Single Side
No ratings yet
Aim - Procedure - Result - Single Side
18 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
NLTK Cheatsheet for Text Analysis
No ratings yet
NLTK Cheatsheet for Text Analysis
3 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
CS Practicals Xii 2021 22
No ratings yet
CS Practicals Xii 2021 22
18 pages
NLP Record
No ratings yet
NLP Record
6 pages
NLP Lab Complete
No ratings yet
NLP Lab Complete
23 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
Preprocessing in Python
No ratings yet
Preprocessing in Python
50 pages
Detail NLP
No ratings yet
Detail NLP
5 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
All Practicals
No ratings yet
All Practicals
33 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
Python Programming Exercises
No ratings yet
Python Programming Exercises
6 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Ccs369-Lab Ex 3,4,5
No ratings yet
Ccs369-Lab Ex 3,4,5
8 pages
Ngram 2x3
No ratings yet
Ngram 2x3
5 pages
Module 5
No ratings yet
Module 5
69 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
LAB02
No ratings yet
LAB02
11 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
Compiler Questions
No ratings yet
Compiler Questions
11 pages
Experiment - 1: Aim-Develop Lexical Analyzer To Recognize Few Patterns
No ratings yet
Experiment - 1: Aim-Develop Lexical Analyzer To Recognize Few Patterns
25 pages
Complete TerrysCompiler Parva
No ratings yet
Complete TerrysCompiler Parva
380 pages
(Ebook) Writing A C Compiler: Build A Real Programming Language From Scratch by Nora Sandler ISBN 9781718500426, 1718500424 Download
100% (2)
(Ebook) Writing A C Compiler: Build A Real Programming Language From Scratch by Nora Sandler ISBN 9781718500426, 1718500424 Download
154 pages
Lexical and Syntax Analysis-4
No ratings yet
Lexical and Syntax Analysis-4
54 pages
Chapter 1 - Overview of Compilation
No ratings yet
Chapter 1 - Overview of Compilation
32 pages
01 - Introduction To Text Analytics - Part2
No ratings yet
01 - Introduction To Text Analytics - Part2
48 pages
SCSA1604-Compiler Design
No ratings yet
SCSA1604-Compiler Design
2 pages
Java Notes
No ratings yet
Java Notes
22 pages
Compiler Design Lab Mannual
No ratings yet
Compiler Design Lab Mannual
34 pages
Compiler Design Quiz
No ratings yet
Compiler Design Quiz
57 pages
Introduction To Lex
No ratings yet
Introduction To Lex
20 pages
NXC Guide
No ratings yet
NXC Guide
123 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
22 pages
Instructional Module and Its Components (Guide) : Course Ccs 3 Developer and Their Background
No ratings yet
Instructional Module and Its Components (Guide) : Course Ccs 3 Developer and Their Background
7 pages
PoPL Lecture 3
No ratings yet
PoPL Lecture 3
31 pages
Part-B - Toc
No ratings yet
Part-B - Toc
20 pages
Chapter 6 LL K and LR K Grammars Formal Language Automata Theory
No ratings yet
Chapter 6 LL K and LR K Grammars Formal Language Automata Theory
13 pages
Chapter1 NLP
No ratings yet
Chapter1 NLP
31 pages
Bangla Text To Speech Using Festival: Firoj Alam S.M. Murtoza Habib Mumit Khan
No ratings yet
Bangla Text To Speech Using Festival: Firoj Alam S.M. Murtoza Habib Mumit Khan
8 pages
Compiler MCQs for CS Students
No ratings yet
Compiler MCQs for CS Students
25 pages
NLP Components and Techniques Guide
No ratings yet
NLP Components and Techniques Guide
26 pages
SS Manual GEC 18CSL66
No ratings yet
SS Manual GEC 18CSL66
49 pages
Chapter-3 Syntax and Semantics
No ratings yet
Chapter-3 Syntax and Semantics
20 pages
Python Chapter 7 - Python Fundamentals
No ratings yet
Python Chapter 7 - Python Fundamentals
30 pages
SPCC - 5
No ratings yet
SPCC - 5
19 pages
Lex Yacc Program Practice
No ratings yet
Lex Yacc Program Practice
21 pages
Bridge Course Question
100% (2)
Bridge Course Question
2 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
Experiment No.1:: Write A LEX Program To Scan Reserved Word & Identifiers of C Language
0% (1)
Experiment No.1:: Write A LEX Program To Scan Reserved Word & Identifiers of C Language
4 pages

NLP Day1

Uploaded by

NLP Day1

Uploaded by

Lab Sheet 1: Introduction to Python and

Text Handling in NLP

School of Engineering and Technology,

• To learn reading, writing, and manipulating text files.

• To perform basic string operations and preprocessing.

• To tokenize text into sentences and words.

• To calculate word frequencies and export results.

1. Read and write text files in Python.

2. Perform basic text cleaning and string operations.

3. Tokenize text using Python and NLTK.

4. Calculate and visualize word frequencies.

5. Export processed results to CSV.

Task 1: Reading and Writing Text Files

7 print ( " Original Text :\ n " , text_data )

9 # Write file ( uppercase version )

13 print ( " \ nUppercase version saved to ’ output_text . txt ’" )

Task 2: Basic String Operations

5 text = " Natural Language Processing ( NLP ) is fun !!! "

7 # Remove leading / trailing spaces

16 print ( " Cleaned Text : " , clean_text )

5 sample = " NLP is amazing . It helps computers understand human language .

Task 4: Word Frequency Analysis

4 # Lowercase and tokenize

Activity: Remove stopwords using:

Task 5: Exporting Results

3 # Convert frequency dictionary to DataFrame

• A modified text file (uppercase/lowercase).

• A CSV file containing word frequencies.

• Screenshot of console output for each task.

You might also like