Lab Sheet 1: Introduction to Python and
Text Handling in NLP
Natural Language Processing Lab
School of Engineering and Technology,
K.R. Mangalam University
August 6, 2025
Objectives
• To understand the basics of Python programming for text processing.
• To learn reading, writing, and manipulating text files.
• To perform basic string operations and preprocessing.
• To tokenize text into sentences and words.
• To calculate word frequencies and export results.
Learning Outcomes
By the end of this lab, students will be able to:
1. Read and write text files in Python.
2. Perform basic text cleaning and string operations.
3. Tokenize text using Python and NLTK.
4. Calculate and visualize word frequencies.
5. Export processed results to CSV.
Task 1: Reading and Writing Text Files
Objective: Learn to load a text file and save modified output.
1
1 # Task 1: Reading & Writing Text Files
2
3 # Read file
4 with open ( " sample_text . txt " , " r " , encoding = " utf -8 " ) as f :
5 text_data = f . read ()
6
7 print ( " Original Text :\ n " , text_data )
8
9 # Write file ( uppercase version )
10 with open ( " output_text . txt " , " w " , encoding = " utf -8 " ) as f :
11 f . write ( text_data . upper () )
12
13 print ( " \ nUppercase version saved to ’ output_text . txt ’" )
Activity: Read a paragraph from a file, convert it to lowercase, and save to a new
file.
Task 2: Basic String Operations
Objective: Practice text cleaning.
1 # Task 2: Basic String Operations
2
3 import string
4
5 text = " Natural Language Processing ( NLP ) is fun !!! "
6
7 # Remove leading / trailing spaces
8 clean_text = text . strip ()
9
10 # Lowercase
11 clean_text = clean_text . lower ()
12
13 # Remove punctuation
14 clean_text = clean_text . translate ( str . maketrans ( " " , " " , string .
punctuation ) )
15
16 print ( " Cleaned Text : " , clean_text )
Activity: Remove digits and replace multiple spaces with a single space.
Task 3: Tokenization
Objective: Split text into sentences and words using NLTK.
2
1 # Task 3: Tokenization
2 import nltk
3 nltk . download ( " punkt " )
4
5 sample = " NLP is amazing . It helps computers understand human language .
"
6
7 # Sentence tokenization
8 sentences = nltk . sent_tokenize ( sample )
9 print ( " Sentences : " , sentences )
10
11 # Word tokenization
12 words = nltk . word_tokenize ( sample )
13 print ( " Words : " , words )
Activity: Tokenize your paragraph and count the number of sentences and words.
Task 4: Word Frequency Analysis
Objective: Count the most common words.
1 # Task 4: Word Frequency Count
2 from collections import Counter
3
4 # Lowercase and tokenize
5 tokens = nltk . word_tokenize ( sample . lower () )
6
7 # Remove punctuation
8 tokens = [ t for t in tokens if t . isalpha () ]
9
10 # Frequency count
11 freq = Counter ( tokens )
12 print ( " Word Frequency : " , freq )
13 print ( " \ nMost Common : " , freq . most_common (5) )
Activity: Remove stopwords using:
1 from nltk . corpus import stopwords
2 nltk . download ( " stopwords " )
3 stop_words = set ( stopwords . words ( " english " ) )
4 tokens = [ t for t in tokens if t not in stop_words ]
Task 5: Exporting Results
Objective: Save word frequency to CSV.
3
1 import pandas as pd
2
3 # Convert frequency dictionary to DataFrame
4 df = pd . DataFrame ( freq . items () , columns =[ " Word " , " Frequency " ])
5
6 # Save to CSV
7 df . to_csv ( " word_frequency . csv " , index = False )
8 print ( " Word frequency saved to ’ word_frequency . csv ’" )
Expected Deliverables
• A text file containing original text.
• A modified text file (uppercase/lowercase).
• A CSV file containing word frequencies.
• Screenshot of console output for each task.