0% found this document useful (0 votes)

86 views12 pages

Python 2 CBP

Uploaded by

Lohith Bommana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views12 pages

Python 2 CBP

Uploaded by

Lohith Bommana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

A Course Based Project Report on

OCCURRENCE OF WORDS
Submitted to the

Department of Information Technology

in partial fulfillment of the requirements for the completion of course

PYTHON PROGRAMMING LABORATORY (22ES2DS101)

BACHELOR OF TECHNOLOGY

INFORMATION TECHNOLOGY

Submitted by

B.MANIRAKSHITH 23071A12D6
B.CHAITHRIKA 23071A12D7
B.MAHATHI 23071A12D8
B.CHOHAN 23071A12D9

Under the guidance of

Mrs. S Swathi
(Course Instructor)
Assistant Professor, Department of IT, VNRVJIET

DEPARTMENT OF INFORMATION TECHNOLOGY

VALLURUPALLI NAGESWARA RAO VIGNANA

JYOTHI INSTITUTE OF ENGINEERING &
TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet (S.O), Hyderabad – 500 090, TS,
India
SEPTEMBER 2023
VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA Accredited for CE, EEE, ME, ECE,
CSE, EIE, IT B. Tech Courses, Approved by AICTE, New Delhi, Affiliated to JNTUH, Recognized as
“College with Potential for Excellence” by UGC, ISO 9001:2015 Certified, QS I GUAGE Diamond Rated
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India

DEPARTMENT OF INFORMATION TECHNOLOGY

CERTIFICATE

This is to certify that the project report entitled “Occurrence Of Words” is a

bonafide work done under our supervision and is being submitted by
Mr.Manirakshith (23071A12D6), Miss. Chaithrika(23071A12D7), Miss. Mahathi
(23071A12D8), Mr. Chohan (23071A12D9) in partial fulfilment for the award of
the degree of Bachelor of Technology in Information Technology, of the
VNRVJIET, Hyderabad during the academic year 2023-2024.

S SWATHI Dr D Srinvasa Rao

Assistant Professor, IT Associate Professor & HOD, IT

Course based Projects Reviewer

VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI

INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade,
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India

DEPARTMENT OF INFORMATION TECHNOLOGY

DECLARATION

We declare that the course based project work entitled “OCCURRENCE OF

WORDS” submitted in the Department of Information Technology, Vallurupalli
Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad,
in partial fulfilment of the requirement for the award of the degree of Bachelor of
Technology in Information Technology is a bonafide record of our own work
carried out under the supervision of S SWATHI, Assistant Professor, Department
of IT, VNRVJIET. Also, we declare that the matter embodied in this thesis has not
been submitted by us in full or in any part thereof for the award of any
degree/diploma of any other institution or university previously.
Place: Hyderabad.

B.Manirakhsith B.Chaithrika B.Mahathi B.Chohan

(23071A12D6) (23071A12D7) (23071A12D8) (23071A12D9)

ACKNOWLEDGEMENT

We express our deep sense of gratitude to our beloved President, Sri. D. Suresh Babu,
VNR Vignana Jyothi Institute of Engineering & Technology for the valuable
guidance and for permitting us to carry out this project.

With immense pleasure, we record our deep sense of gratitude to our beloved
Principal, Dr. C.D Naidu, for permitting us to carry out this project.

We express our deep sense of gratitude to our beloved Professor Dr. SRINIVASA
RAO DAMMAVALAM, Associate Professor and Head, Department of Information
Technology, VNR Vignana Jyothi Institute of Engineering & Technology,
Hyderabad-500090 for the valuable guidance and suggestions, keen interest and
through encouragement extended throughout the period of project work.

We take immense pleasure to express our deep sense of gratitude to our beloved
Guide, S Swathi, Assistant Professor in Information Technology, VNR Vignana
Jyothi Institute of Engineering & Technology, Hyderabad, for his/her valuable
suggestions and rare insights, for constant source of encouragement and inspiration
throughout my project work.

We express our thanks to all those who contributed for the successful completion of
our project work.

Mr. B. Manirakshith (23071A12D6)

Miss. B.Chaithrika (23071A12D7)
Miss. B. Mahathi (23071A12D8)
Mr. B. Chohan (23071A12D9)
ABSTRACT

This project aims to analyze the occurrence of words within a given text corpus using

Python. The primary objective is to develop a comprehensive tool that can process

text data, count the frequency of each word, and visualize the results in an insightful

manner. By leveraging Python's rich ecosystem of libraries, such as collections for

counting, matplotlib and seaborn for visualization, and nltk for text processing, this

project provides a robust solution for textual analysis.

Data Preprocessing: The text data is cleaned and prepared for analysis. This involves

converting text to lowercase, removing punctuation, and handling stopwords. Word

Counting: The cleaned text is then processed to count the occurrences of each word

using Python's Counter from the collections module.Data Visualization: The word

frequency data is visualized using bar charts and word clouds to provide a clear and

intuitive understanding of the most common words in the text corpus. Advanced

Analysis: Further analysis includes n-gram generation, sentiment analysis, and topic

modeling to gain deeper insights into the text data.Scalability: The project is

designed to handle large datasets efficiently. By utilizing optimized data structures

and algorithms, it ensures scalability for extensive text corpora without

compromising performance. Customization: Users can customize the analysis by

selecting specific subsets of text, defining custom stopwords, and setting parameters

for visualization, making the tool adaptable to various text analysis needs. Language

Support The tool supports multiple languages, allowing for word occurrence

analysis in diverse linguistic contexts.

1
This is achieved through the integration of language-specific libraries and

resources.*User Interface:* A simple and intuitive user interface is provided for non-

technical users, enabling easy upload of text files, execution of analysis, and viewing

of results without requiring programming knowledge. Integration Capabilities:

The project can be integrated with other data processing and visualization tools, such

as Pandas for data manipulation and Plotly for interactive visualizations, enhancing

its utility in comprehensive data analysis workflows.

This project has broad applications, including text mining, sentiment analysis, and

natural language processing tasks, making it a valuable tool for researchers, data

scientists, and developers working with textual data. Through this project, users can

uncover patterns, trends, and insights from textual datasets, facilitating more

informed decision-making.

2
TABLE OF CONTENTS

S No Contents Page No
1. INTRODUCTION 4
2. SOURCE CODE 5
3. OUTPUT 6
4. CONCLUSION 7
5. REFERENCES 8

3
INTRODUCTION
1.1 PROBLEM DEFINITION

Python program for printing of occurrence of words in a given text.

1.2OBJECTIVE

The objective of this Python project is to develop a versatile and efficient tool for

analyzing the occurrence of words within a given text corpus.

1. Text Data Preprocessing: Implement robust methods to clean and preprocess text

data, including tasks such as case normalization, punctuation removal, and stopword

filtering.

2. *Word Frequency Analysis:* Accurately count and record the frequency of each

word in the text corpus using efficient data structures and algorithms.

3. *Data Visualization:* Create clear and insightful visualizations, such as bar charts

and word clouds, to represent word frequencies and patterns in the text data.

4. *Scalability:* Ensure the tool can handle large text datasets efficiently, maintaining

performance and accuracy as the size of the data increases.

5. Educational Resource: Provide clear documentation and examples to serve as an

educational resource for users interested in learning about text analysis and

Python programming.

4
2. SOURCE CODE
def word_occurrences(text):

# Normalize the text to lower case and split into words

words = text.lower().split()

# Use a set to store unique words

unique_words = set(words)

# Create a dictionary to store word counts

word_count = {word: 0 for word in unique_words}

# Count occurrences of each word

for word in words:

word_count[word] += 1

# Convert the dictionary to a list of tuples

word_count_tuples = [(word, count) for word, count in word_count.items()]

return word_count_tuples

# Sample text

text = "This is a test. This test is only a test."

# Get word occurrences

occurrences = word_occurrences(text)

# Print the result

print("Word occurrences:")

for word, count in occurrences:

print(f"{word}: {count}")

5
3. TEST CASES/ OUTPUT
3.1 Test case 1:

INPUT: text= This is a test. This test is only a test.

Output:

3.2

Input : text= How much wood would a woodchuck chuck, if a woodchuck

could chuck wood.

Output:

6
CONCLUSION

The word occurrence counter project effectively demonstrates text preprocessing and

analysis using Python. By employing regular expressions and the Counter class, it

accurately counts word frequencies, providing a foundation for various NLP tasks.

This project highlights Python's utility in handling and analyzing textual

data efficiently. The word occurrence counter project effectively showcases the

capability of Python for text analysis. By utilizing regular expressions for text

preprocessing and the collections.Counter class for counting, the project demonstrates

efficient handling of textual data. This approach ensures accurate word frequency

analysis, providing valuable insights into the text's structure and content. The project

highlights Python's strength in data manipulation and its suitability for natural

language processing (NLP) tasks. With practical applications in various fields like

linguistics, content analysis, and SEO, this project serves as a foundational tool for

more advanced text processing and analysis endeavors.

7
REFERENCES

[1]. W3schools: https://www.w3schools.com/python/

[2]. *Coursera: https://www.coursera.org/courses?query=python

[3]. *edX : https://www.edx.org/learn/python

[4]. *Codecademy : https://www.codecademy.com/learn/learn-python-3

NLP Analysis for AI Students
No ratings yet
NLP Analysis for AI Students
5 pages
Bavya NLP 0.1
No ratings yet
Bavya NLP 0.1
5 pages
Logabaalan 22AD042
No ratings yet
Logabaalan 22AD042
5 pages
FP Practical - 05
No ratings yet
FP Practical - 05
5 pages
File Handling - Questions
No ratings yet
File Handling - Questions
7 pages
Python Dictionary Project
No ratings yet
Python Dictionary Project
15 pages
Computer Science Assignment For Grade XII-2 19.04.2024
No ratings yet
Computer Science Assignment For Grade XII-2 19.04.2024
2 pages
File Handling Rev Book 25-26
No ratings yet
File Handling Rev Book 25-26
27 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
CSP Report FINAL
No ratings yet
CSP Report FINAL
46 pages
Assignment For Application Data Science Track in Information Studies Master
No ratings yet
Assignment For Application Data Science Track in Information Studies Master
1 page
A8 Report
No ratings yet
A8 Report
33 pages
Python Jumbled Words Project
100% (1)
Python Jumbled Words Project
11 pages
Harris
No ratings yet
Harris
5 pages
Text File Practice Questions
No ratings yet
Text File Practice Questions
3 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Batch 2
No ratings yet
Batch 2
13 pages
Software Engineering Slot: F2: School of Computer Science & Engineering
No ratings yet
Software Engineering Slot: F2: School of Computer Science & Engineering
42 pages
Kendriya Vidyalaya No. 1 Jalahalli West, Bangalore: A Project Report ON
No ratings yet
Kendriya Vidyalaya No. 1 Jalahalli West, Bangalore: A Project Report ON
11 pages
Projectformat-Ctsd (1) - 1
No ratings yet
Projectformat-Ctsd (1) - 1
14 pages
A Graduate Tracer System (PDF File)
100% (1)
A Graduate Tracer System (PDF File)
114 pages
Exp 7
No ratings yet
Exp 7
9 pages
Class Xii Text File Handling Assignment
No ratings yet
Class Xii Text File Handling Assignment
3 pages
Project Report - 1
No ratings yet
Project Report - 1
4 pages
Design and Implementation of A System That Detcts Racist Text
No ratings yet
Design and Implementation of A System That Detcts Racist Text
46 pages
Vinitha Final Project Document
No ratings yet
Vinitha Final Project Document
47 pages
Research Proposal Sample
100% (1)
Research Proposal Sample
10 pages
Automated Essay Grading Report
No ratings yet
Automated Essay Grading Report
6 pages
Solving Venn-Diagram Questions in Set Theory Using Natural Language Processing
No ratings yet
Solving Venn-Diagram Questions in Set Theory Using Natural Language Processing
78 pages
PNR No: 2019016400065362 Roll No: 25: Proforma For The Approval Project Proposal
No ratings yet
PNR No: 2019016400065362 Roll No: 25: Proforma For The Approval Project Proposal
23 pages
Project 2
No ratings yet
Project 2
13 pages
CLASS 12 TEST QUESTION PAPER Term-2 Revision
No ratings yet
CLASS 12 TEST QUESTION PAPER Term-2 Revision
3 pages
Python Reprt
No ratings yet
Python Reprt
15 pages
Project Report Final
No ratings yet
Project Report Final
83 pages
Keylogger Project Report
No ratings yet
Keylogger Project Report
23 pages
PY0101EN 3 5 Practice - Lab 20230526 1685059200.jupyterlite
No ratings yet
PY0101EN 3 5 Practice - Lab 20230526 1685059200.jupyterlite
7 pages
Python MP
No ratings yet
Python MP
17 pages
Exercise 51 Ec
No ratings yet
Exercise 51 Ec
4 pages
MAD Report Final
No ratings yet
MAD Report Final
64 pages
Filtered Project Report
No ratings yet
Filtered Project Report
28 pages
KGiSL Institute of Technolog (Final)
No ratings yet
KGiSL Institute of Technolog (Final)
33 pages
Python
No ratings yet
Python
22 pages
Data Structures and Algorithms II Fall 2019 Programming Assignment #1
No ratings yet
Data Structures and Algorithms II Fall 2019 Programming Assignment #1
7 pages
Project Report II
No ratings yet
Project Report II
26 pages
Design and Development of Plagiarism Detection Software in C
No ratings yet
Design and Development of Plagiarism Detection Software in C
3 pages
TSA Student
No ratings yet
TSA Student
20 pages
Class 12 Cs Final Prac
No ratings yet
Class 12 Cs Final Prac
68 pages
Mansayali
No ratings yet
Mansayali
42 pages
Submitted To Assam University
No ratings yet
Submitted To Assam University
8 pages
Python & SQL Programming Tasks
No ratings yet
Python & SQL Programming Tasks
8 pages
B43 NLP Exp8
No ratings yet
B43 NLP Exp8
15 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Final Report
No ratings yet
Final Report
59 pages
TEXT FILE HANDLING Questions
No ratings yet
TEXT FILE HANDLING Questions
4 pages
Amahic QA For Definition Biography and Description Questions
No ratings yet
Amahic QA For Definition Biography and Description Questions
105 pages
Computer Science Class XII 2021 22 Investigatory Project
No ratings yet
Computer Science Class XII 2021 22 Investigatory Project
36 pages
'Chapter One 1.1 Background of The Study
No ratings yet
'Chapter One 1.1 Background of The Study
48 pages
MAD Mini Project-Group
No ratings yet
MAD Mini Project-Group
33 pages
Computer Science and Engineering: Bachelor of Technology
No ratings yet
Computer Science and Engineering: Bachelor of Technology
8 pages
Q2 Arts 6 - Module 7
No ratings yet
Q2 Arts 6 - Module 7
19 pages
Libros Virtuales de Todos Los Cursos
No ratings yet
Libros Virtuales de Todos Los Cursos
3 pages
Internet Influence On Kids Essay
No ratings yet
Internet Influence On Kids Essay
2 pages
2425 Mathematics Grade 11 MHS Course Questions
No ratings yet
2425 Mathematics Grade 11 MHS Course Questions
85 pages
PG Profile Extension
No ratings yet
PG Profile Extension
4 pages
How To Troubleshoot Updates Installation
No ratings yet
How To Troubleshoot Updates Installation
29 pages
Economic Survey of Pak 2024 2025 Solved MCQs Notes by Shan Ali Junejo 03490975541
No ratings yet
Economic Survey of Pak 2024 2025 Solved MCQs Notes by Shan Ali Junejo 03490975541
8 pages
FullWaver GB
No ratings yet
FullWaver GB
5 pages
Entra v1
No ratings yet
Entra v1
2 pages
Naukri RajathR (5y 0m)
No ratings yet
Naukri RajathR (5y 0m)
3 pages
1.3 BSKE 2023 - Annexes A B and C Barangay Inventory and Turnover of BPFRD and Mon
No ratings yet
1.3 BSKE 2023 - Annexes A B and C Barangay Inventory and Turnover of BPFRD and Mon
6 pages
Computer Architecture & Organization
No ratings yet
Computer Architecture & Organization
16 pages
DSA Lab Task 3
No ratings yet
DSA Lab Task 3
6 pages
Unit IV
No ratings yet
Unit IV
97 pages
Audio Pro Addon T12 Speaker
No ratings yet
Audio Pro Addon T12 Speaker
38 pages
Master of Professional Engineering (Electrical) - QUT
No ratings yet
Master of Professional Engineering (Electrical) - QUT
7 pages
MAXPRO® NVR Series - Device Support List
No ratings yet
MAXPRO® NVR Series - Device Support List
53 pages
Presenation
No ratings yet
Presenation
2 pages
TX7812
No ratings yet
TX7812
1 page
Software Requirements Specification - Healthcare - Chatbot
No ratings yet
Software Requirements Specification - Healthcare - Chatbot
5 pages
LI100P0-Q25LM0-IOLX3-H1141 Parametros IO-Link
No ratings yet
LI100P0-Q25LM0-IOLX3-H1141 Parametros IO-Link
15 pages
Account Statement: Penyata Akaun
No ratings yet
Account Statement: Penyata Akaun
2 pages
ICT Graphic
No ratings yet
ICT Graphic
6 pages
Practical Book - STD-12th - Chapter - 2 Python Revision Tour-2.1
No ratings yet
Practical Book - STD-12th - Chapter - 2 Python Revision Tour-2.1
14 pages
Dbms Lab-1 Submitted By-Saurav Majoka (B19CSE079) : 1.connect The Database and Create The Table
No ratings yet
Dbms Lab-1 Submitted By-Saurav Majoka (B19CSE079) : 1.connect The Database and Create The Table
11 pages
Internet Banking Java Project Report 5 PDF Free
No ratings yet
Internet Banking Java Project Report 5 PDF Free
68 pages
MAVIC 3 Multispectral Drone Specs
No ratings yet
MAVIC 3 Multispectral Drone Specs
4 pages
Perancangan Enterprise Architecture Pada PT Vitapharm Menggunakan Framework Togaf
No ratings yet
Perancangan Enterprise Architecture Pada PT Vitapharm Menggunakan Framework Togaf
13 pages
Post Test-Epr
No ratings yet
Post Test-Epr
61 pages
Placement Talk 2015 2019
No ratings yet
Placement Talk 2015 2019
59 pages

Python 2 CBP

Uploaded by

Python 2 CBP

Uploaded by

A Course Based Project Report on

Department of Information Technology

in partial fulfillment of the requirements for the completion of course

Under the guidance of

DEPARTMENT OF INFORMATION TECHNOLOGY

VALLURUPALLI NAGESWARA RAO VIGNANA

DEPARTMENT OF INFORMATION TECHNOLOGY

This is to certify that the project report entitled “Occurrence Of Words” is a

S SWATHI Dr D Srinvasa Rao

Assistant Professor, IT Associate Professor & HOD, IT

VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI

DEPARTMENT OF INFORMATION TECHNOLOGY

We declare that the course based project work entitled “OCCURRENCE OF

B.Manirakhsith B.Chaithrika B.Mahathi B.Chohan

(23071A12D6) (23071A12D7) (23071A12D8) (23071A12D9)

Mr. B. Manirakshith (23071A12D6)

manner. By leveraging Python's rich ecosystem of libraries, such as collections for

project provides a robust solution for textual analysis.

converting text to lowercase, removing punctuation, and handling stopwords. Word

designed to handle large datasets efficiently. By utilizing optimized data structures

and algorithms, it ensures scalability for extensive text corpora without

compromising performance. Customization: Users can customize the analysis by

analysis in diverse linguistic contexts.

of results without requiring programming knowledge. *Integration Capabilities:*

its utility in comprehensive data analysis workflows.

Python program for printing of occurrence of words in a given text.

analyzing the occurrence of words within a given text corpus.

performance and accuracy as the size of the data increases.

5. *Educational Resource:* Provide clear documentation and examples to serve as an

# Normalize the text to lower case and split into words

# Use a set to store unique words

# Create a dictionary to store word counts

word_count = {word: 0 for word in unique_words}

# Count occurrences of each word

for word in words:

# Convert the dictionary to a list of tuples

word_count_tuples = [(word, count) for word, count in word_count.items()]

text = "This is a test. This test is only a test."

# Get word occurrences

# Print the result

for word, count in occurrences:

INPUT: text= This is a test. This test is only a test.

Input : text= How much wood would a woodchuck chuck, if a woodchuck

could chuck wood.

This project highlights Python's utility in handling and analyzing textual

more advanced text processing and analysis endeavors.

[1]. W3schools: https://www.w3schools.com/python/

[2]. *Coursera: https://www.coursera.org/courses?query=python

[3]. *edX : https://www.edx.org/learn/python

[4]. *Codecademy : https://www.codecademy.com/learn/learn-python-3

You might also like

of results without requiring programming knowledge. Integration Capabilities:

5. Educational Resource: Provide clear documentation and examples to serve as an