Group 12 - Report
Group 12 - Report
Submitted by Group - 12
Amrutha Varshini MBAA22006
Arushi Golia MBAA22016
Nihali Sawant MBAA22040
Parul Saraswat MBAA22046
Shambhavi Gupta MBAA22063
Table of Contents
01. 07.
Introduction Sentiment Analysis
08.
02. Topic Modeling
Background
09.
03. Sentiment Analysis
Reading and Preprocessing of
10.
CONTENTS
Text
Named Entity Recognition
04. (NER)
Word Frequency and
Probability 11.
Geospatial Analysis
05.
N-Gram Analysis
12.
Summary
06.
Word Cloud
01
Introduction
Autocorrect is a valuable tool in modern communication, leveraging machine learning
and natural language processing (NLP) to enhance writing tasks. It predicts and
rectifies misspellings, streamlining the creation of paragraphs, reports, and articles.
Numerous websites and social media platforms integrate autocorrect to enhance user
experiences.
Python, a versatile programming language, is a popular choice for developing
autocorrection systems. The project begins with the utilization of the Natural Language
datasets, enable the system to identify and correct errors with high accuracy. NLP
techniques come into play for understanding context, contextually-driven corrections,
and handling complex language nuances.
The autocorrect tool not only improves spelling but also enhances grammar,
punctuation, and overall writing quality. It has become an indispensable feature in our
digital age, aiding effective communication and reducing errors in written content.
In summary, autocorrect, driven by machine learning and NLP, is a vital tool for
improving the quality of text-based communication across various platforms, making it
easier for individuals to compose error-free paragraphs, reports, and articles. Its
continued development and integration into digital environments demonstrate the
importance of technology in enhancing our writing abilities.
02
03
04
05
03 Switching Letter
06
Deletion of Letter
Function that Removes a letter from a given word.
07
Replace Letter
It changes one letter to another.
08
Now, we have implemented all the five steps. It’s time to merge all the words
(i.e. all functions) formed by those steps.
01
Collecting all the words in a set(so that no word
will repeat)
09
Now the code is ready, we can test it for any user input by the below code.
Let’s print top 3 suggestions made by the Autocorrect.
The initial implementation involves a basic auto-corrector using Python and NLTK. To enhance it,
the next step is to develop a high-level auto-corrector system that leverages extensive datasets for
improved efficiency and accuracy in correcting spelling and grammar errors in text, making it more
robust and capable.
10
11
12
13
Using
“analyze_sentiment”,
we can see that the
tone/sentiment of the
text is Positive.
Further, to see the most used words other than stop words (commonly
used words like pronouns, conjunctions, prepositions etc, stopwords
package was used and the follwing 10 most used words were listed.
14
The top words under each of the topics are found to be as follows
15
In the context of the book, this section would extract and categorize
entities like "Sherlock Holmes," "221B Baker Street," and other character
names and locations mentioned in the stories.
16
LINGUISTIC ANALYSIS
Number of sentences: 7
Passive voice sentences:
He was still, as ever, deeply attracted by the study of crime, and occupied his immense
faculties and extraordinary powers of observation in following out those clues, and clearing
up those mysteries which had been abandoned as hopeless by the official police.
He was still, as ever, deeply attracted by the study of crime, and occupied his immense
faculties and extraordinary powers of observation in following out those clues, and clearing
up those mysteries which had been abandoned as hopeless by the official police.
Average sentence complexity: 9.571428571428571
17
This step can help visualize the various places where the adventures take
place in the book.
SUMMARY
In summary, the code is designed to perform a wide range of text analysis tasks on
"The Adventures of Sherlock Holmes" text file ('final.txt'). It extracts valuable
information about the content, structure, and sentiment of the book, making it a
versatile tool for gaining insights into the text and its themes.
18