A Course Based Project Report on
OCCURRENCE OF WORDS
Submitted to the
Department of Information Technology
in partial fulfillment of the requirements for the completion of course
PYTHON PROGRAMMING LABORATORY (22ES2DS101)
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by
B.MANIRAKSHITH 23071A12D6
B.CHAITHRIKA 23071A12D7
B.MAHATHI 23071A12D8
B.CHOHAN 23071A12D9
Under the guidance of
Mrs. S Swathi
(Course Instructor)
Assistant Professor, Department of IT, VNRVJIET
DEPARTMENT OF INFORMATION TECHNOLOGY
VALLURUPALLI NAGESWARA RAO VIGNANA
JYOTHI INSTITUTE OF ENGINEERING &
TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet (S.O), Hyderabad – 500 090, TS,
India
SEPTEMBER 2023
VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA Accredited for CE, EEE, ME, ECE,
CSE, EIE, IT B. Tech Courses, Approved by AICTE, New Delhi, Affiliated to JNTUH, Recognized as
“College with Potential for Excellence” by UGC, ISO 9001:2015 Certified, QS I GUAGE Diamond Rated
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India
DEPARTMENT OF INFORMATION TECHNOLOGY
CERTIFICATE
This is to certify that the project report entitled “Occurrence Of Words” is a
bonafide work done under our supervision and is being submitted by
Mr.Manirakshith (23071A12D6), Miss. Chaithrika(23071A12D7), Miss. Mahathi
(23071A12D8), Mr. Chohan (23071A12D9) in partial fulfilment for the award of
the degree of Bachelor of Technology in Information Technology, of the
VNRVJIET, Hyderabad during the academic year 2023-2024.
S SWATHI Dr D Srinvasa Rao
Assistant Professor, IT Associate Professor & HOD, IT
Course based Projects Reviewer
VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade,
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India
DEPARTMENT OF INFORMATION TECHNOLOGY
DECLARATION
We declare that the course based project work entitled “OCCURRENCE OF
WORDS” submitted in the Department of Information Technology, Vallurupalli
Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad,
in partial fulfilment of the requirement for the award of the degree of Bachelor of
Technology in Information Technology is a bonafide record of our own work
carried out under the supervision of S SWATHI, Assistant Professor, Department
of IT, VNRVJIET. Also, we declare that the matter embodied in this thesis has not
been submitted by us in full or in any part thereof for the award of any
degree/diploma of any other institution or university previously.
Place: Hyderabad.
B.Manirakhsith B.Chaithrika B.Mahathi B.Chohan
(23071A12D6) (23071A12D7) (23071A12D8) (23071A12D9)
ACKNOWLEDGEMENT
We express our deep sense of gratitude to our beloved President, Sri. D. Suresh Babu,
VNR Vignana Jyothi Institute of Engineering & Technology for the valuable
guidance and for permitting us to carry out this project.
With immense pleasure, we record our deep sense of gratitude to our beloved
Principal, Dr. C.D Naidu, for permitting us to carry out this project.
We express our deep sense of gratitude to our beloved Professor Dr. SRINIVASA
RAO DAMMAVALAM, Associate Professor and Head, Department of Information
Technology, VNR Vignana Jyothi Institute of Engineering & Technology,
Hyderabad-500090 for the valuable guidance and suggestions, keen interest and
through encouragement extended throughout the period of project work.
We take immense pleasure to express our deep sense of gratitude to our beloved
Guide, S Swathi, Assistant Professor in Information Technology, VNR Vignana
Jyothi Institute of Engineering & Technology, Hyderabad, for his/her valuable
suggestions and rare insights, for constant source of encouragement and inspiration
throughout my project work.
We express our thanks to all those who contributed for the successful completion of
our project work.
Mr. B. Manirakshith (23071A12D6)
Miss. B.Chaithrika (23071A12D7)
Miss. B. Mahathi (23071A12D8)
Mr. B. Chohan (23071A12D9)
ABSTRACT
This project aims to analyze the occurrence of words within a given text corpus using
Python. The primary objective is to develop a comprehensive tool that can process
text data, count the frequency of each word, and visualize the results in an insightful
manner. By leveraging Python's rich ecosystem of libraries, such as collections for
counting, matplotlib and seaborn for visualization, and nltk for text processing, this
project provides a robust solution for textual analysis.
Data Preprocessing: The text data is cleaned and prepared for analysis. This involves
converting text to lowercase, removing punctuation, and handling stopwords. Word
Counting: The cleaned text is then processed to count the occurrences of each word
using Python's Counter from the collections module.Data Visualization: The word
frequency data is visualized using bar charts and word clouds to provide a clear and
intuitive understanding of the most common words in the text corpus. Advanced
Analysis: Further analysis includes n-gram generation, sentiment analysis, and topic
modeling to gain deeper insights into the text data.Scalability: The project is
designed to handle large datasets efficiently. By utilizing optimized data structures
and algorithms, it ensures scalability for extensive text corpora without
compromising performance. Customization: Users can customize the analysis by
selecting specific subsets of text, defining custom stopwords, and setting parameters
for visualization, making the tool adaptable to various text analysis needs. Language
Support The tool supports multiple languages, allowing for word occurrence
analysis in diverse linguistic contexts.
1
This is achieved through the integration of language-specific libraries and
resources.*User Interface:* A simple and intuitive user interface is provided for non-
technical users, enabling easy upload of text files, execution of analysis, and viewing
of results without requiring programming knowledge. *Integration Capabilities:*
The project can be integrated with other data processing and visualization tools, such
as Pandas for data manipulation and Plotly for interactive visualizations, enhancing
its utility in comprehensive data analysis workflows.
This project has broad applications, including text mining, sentiment analysis, and
natural language processing tasks, making it a valuable tool for researchers, data
scientists, and developers working with textual data. Through this project, users can
uncover patterns, trends, and insights from textual datasets, facilitating more
informed decision-making.
2
TABLE OF CONTENTS
S No Contents Page No
1. INTRODUCTION 4
2. SOURCE CODE 5
3. OUTPUT 6
4. CONCLUSION 7
5. REFERENCES 8
3
INTRODUCTION
1.1 PROBLEM DEFINITION
Python program for printing of occurrence of words in a given text.
1.2OBJECTIVE
The objective of this Python project is to develop a versatile and efficient tool for
analyzing the occurrence of words within a given text corpus.
1. Text Data Preprocessing: Implement robust methods to clean and preprocess text
data, including tasks such as case normalization, punctuation removal, and stopword
filtering.
2. *Word Frequency Analysis:* Accurately count and record the frequency of each
word in the text corpus using efficient data structures and algorithms.
3. *Data Visualization:* Create clear and insightful visualizations, such as bar charts
and word clouds, to represent word frequencies and patterns in the text data.
4. *Scalability:* Ensure the tool can handle large text datasets efficiently, maintaining
performance and accuracy as the size of the data increases.
5. *Educational Resource:* Provide clear documentation and examples to serve as an
educational resource for users interested in learning about text analysis and
Python programming.
4
2. SOURCE CODE
def word_occurrences(text):
# Normalize the text to lower case and split into words
words = text.lower().split()
# Use a set to store unique words
unique_words = set(words)
# Create a dictionary to store word counts
word_count = {word: 0 for word in unique_words}
# Count occurrences of each word
for word in words:
word_count[word] += 1
# Convert the dictionary to a list of tuples
word_count_tuples = [(word, count) for word, count in word_count.items()]
return word_count_tuples
# Sample text
text = "This is a test. This test is only a test."
# Get word occurrences
occurrences = word_occurrences(text)
# Print the result
print("Word occurrences:")
for word, count in occurrences:
print(f"{word}: {count}")
5
3. TEST CASES/ OUTPUT
3.1 Test case 1:
INPUT: text= This is a test. This test is only a test.
Output:
3.2
Input : text= How much wood would a woodchuck chuck, if a woodchuck
could chuck wood.
Output:
6
CONCLUSION
The word occurrence counter project effectively demonstrates text preprocessing and
analysis using Python. By employing regular expressions and the Counter class, it
accurately counts word frequencies, providing a foundation for various NLP tasks.
This project highlights Python's utility in handling and analyzing textual
data efficiently. The word occurrence counter project effectively showcases the
capability of Python for text analysis. By utilizing regular expressions for text
preprocessing and the collections.Counter class for counting, the project demonstrates
efficient handling of textual data. This approach ensures accurate word frequency
analysis, providing valuable insights into the text's structure and content. The project
highlights Python's strength in data manipulation and its suitability for natural
language processing (NLP) tasks. With practical applications in various fields like
linguistics, content analysis, and SEO, this project serves as a foundational tool for
more advanced text processing and analysis endeavors.
7
REFERENCES
[1]. W3schools: https://www.w3schools.com/python/
[2]. *Coursera: https://www.coursera.org/courses?query=python
[3]. *edX : https://www.edx.org/learn/python
[4]. *Codecademy : https://www.codecademy.com/learn/learn-python-3