0% found this document useful (0 votes)
725 views44 pages

Analysis of Women Safety in Indian Cities Using Machine Learning On Tweets

This document discusses analyzing women's safety in Indian cities using machine learning on tweets from Twitter. It focuses on how social media can promote women's safety by spreading awareness messages. The paper also examines using Twitter data and machine learning algorithms to classify tweets about women's safety into different sentiments to understand public views and how safe women feel in public places in Indian cities. It describes the process of collecting Twitter data, preprocessing it, extracting features, and using classification algorithms like Naive Bayes to analyze sentiment.

Uploaded by

Sandesh Ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
725 views44 pages

Analysis of Women Safety in Indian Cities Using Machine Learning On Tweets

This document discusses analyzing women's safety in Indian cities using machine learning on tweets from Twitter. It focuses on how social media can promote women's safety by spreading awareness messages. The paper also examines using Twitter data and machine learning algorithms to classify tweets about women's safety into different sentiments to understand public views and how safe women feel in public places in Indian cities. It describes the process of collecting Twitter data, preprocessing it, extracting features, and using classification algorithms like Naive Bayes to analyze sentiment.

Uploaded by

Sandesh Ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Analysis of women safety in Indian cities using machine

learning on tweets

ABSTRACT
Women and girls have been experiencing a lot of violence and harassment in
public places in various cities starting from stalking and leading to abuse
harassment or abuse assault. This research paper basically focuses on the role of
social media in promoting the safety of women in Indian cities with special
reference to the role of social media websites and applications including Twitter
platform Facebook and Instagram. This paper also focuses on how a sense of
responsibility on part of Indian society can be developed the common Indian
people so that we should focus on the safety of women surrounding them. Tweets
on Twitter which usually contains images and text and also written messages and
quotes which focus on the safety of women in Indian cities can be used to read a
message amongst the Indian Youth Culture and educate people to take strict action
and punish those who harass the women. Twitter and other Twitter handles which
include hash tag messages that are widely spread across the whole globe sir as a
platform for women to express their views about how they feel while we go out for
work or travel in a public transport and what is the state of their mind when they
are surrounded by unknown men and whether these women feel safe or not?
INTRODUCTION
Twitter has emerged as a major micro-blogging website, having over 100 million users
generating over 500 million tweets every day. With such large audience, Twitter has consistently
attracted users to convey their opinions and perspective about any issue, brand, company or any
other topic of interest. Due to this reason, Twitter is used as an informative source by many
organizations, institutions and companies.

On Twitter, users are allowed to share their opinions in the form of tweets, using only 140
characters. This leads to people compacting their statements by using slang, abbreviations,
emoticons, short forms etc. Along with this, people convey their opinions by using sarcasm and
polysemy.

Hence it is justified to term the Twitter language as unstructured.


In order to extract sentiment from tweets, sentiment analysis is used. The results from this can be
used in many areas like analyzing and monitoring changes of sentiment with an event,
sentiments regarding a particular brand or release of a particular product, analyzing public view
of government policies etc.

A lot of research has been done on Twitter data in order to classify the tweets and analyze the
results. In this paper we aim to review of some researches in this domain and study how to
perform sentiment analysis on Twitter data using Python. The scope of this paper is limited to
that of the machine learning models and we show the comparison of efficiencies of these models
with one another.

There are certain types of harassment and Violence that are very aggressive including staring and
passing comments and these unacceptable practices are usually seen as a normal part of the
urban life. There have been several studies that have been conducted in cities across India and
women report similar type of sexual harassment and passing off comments by other unknown
people. The study that was conducted across most popular Metropolitan cities of India including
Delhi, Mumbai and Pune, it was shown that 60 % of the women feel unsafe while going out to
work or while travelling in public transport.

Women have the right to the city which means that they can go freely whenever they want
whether it be too an Educational Institute, or any other place women want to go. But women feel
that they are unsafe in places like malls, shopping malls on their way to their job location
because of the several unknown Eyes body shaming and harassing these women point Safety or
lack of concrete consequences in the life of women is the main reason of harassment of girls.
There are instances when the harassment of girls was done by their neighbors while they were on
the way to school or there was a lack of safety that created a sense of fear in the minds of small
girls who throughout their lifetime suffer due to that one instance that happened in their lives
where they were forced to do something unacceptable or was sexually harassed by one of their
own neighbor or any other unknown person.

Safest cities approach women safety from a perspective of women rights to the affect the city
without fear of violence or sexual harassment. Rather than imposing restrictions on women that
society usually imposes it is the duty of society to imprecise the need of protection of women
and also recognizes that women and girls also have a right same as men have to be safe in the
City.

Analysis of twitter texts collection also includes the name of people and name of women who
stand up against sexual harassment and unethical behavior of men in Indian cities which make
them uncomfortable to walk freely. The data set that was obtained through Twitter about the
status of women safety in Indian society was for the processed through machine learning
algorithms for the purpose of smoothening the data by removing zero values and using Laplace
and porter’s theory is to developer method of analyzation of data and remove retweet and
redundant data from the data set that is obtained so that a clear and original view of safety status
of women in Indian society is obtained.

People often express their views freely on social media about what they feel about the Indian
society and the politicians that claim that Indian cities are safe for women. On social media
websites people can freely express their view point and women can share their experiences
where they have faced sexual harassment or where we would have fight back against the sexual
harassment that was imposed on them.

The tweets about safety of women and stories of standing up against sexual harassment further
motivates other women data on the same social media website or application like Twitter. Other
women share these messages and tweets which further motivates other 5 men or 10 women to
stand up and raise a voice against people who have made Indian cities and unsafe place for the
women. In the recent years a large number of people have been attracted towards social media
platforms like Facebook, Twitter and Instagram point and most of the people are using it to
express their emotions and also their opinions about what they think about the Indian cities and
Indian society. There are several method of sentiment that can be categorized like machine
learning hybrid and lexicon-based learning.

Also there are another categorization presented with categories of statistical, knowledge-based
and age wise differentiation approaches. It is a common practice to extract the information from
the data that is available on social networking through procedures of data extraction, data
analysis and data interpretation methods. The accuracy of the Twitter analysis and prediction can
be obtained by the use of behavioral analysis on the basis of social networks.

ABOUT SENTIMENT ANALYSIS

Sentiment analysis is a process of deriving sentiment of a particular statement or sentence. It’s a


classification technique which derives opinion from the tweets and formulates a sentiment and
on the basis of which, sentiment classification is performed.

Sentiments are subjective to the topic of interest. We are required to formulate that what kind of
features will decide for the sentiment it embodies.
In the programming model, sentiment we refer to, is class of entities that the person performing
sentiment analysis wants to find in the tweets. The dimension of the sentiment class is crucial
factor in deciding the efficiency of the model.

For example, we can have two-class tweet sentiment classification (positive and negative) or
three class tweet sentiment classification (positive, negative and neutral).
Sentiment analysis approaches can be broadly categorized in two classes – lexicon based and
machine learning based.

Lexicon based approach is unsupervised as it proposes to perform analysis using lexicons and a
scoring method to evaluate opinions. Whereas machine learning approach involves use of feature
extraction and training the model using feature set and some dataset.

The basic steps for performing sentiment analysis includes data collection, pre-processing of
data, feature extraction, selecting baseline features, sentiment detection and performing
classification either using simple computation or else machine learning approaches.

Process Involved in Analyzing sentiment data


The complex process of measuring the sentiments of the tweets involves 5 major steps:
1) Data Extraction: This step involved in sentiment analysis consists of gathering the data from
social network site twitter source which helps in extracting the tweet text but also provides extra
information about the tweets like likes and retweets.

2) Text Cleaning: After the tweets for a topic are extracted before passing it to the classifier, we
need to clean the dataset to remove emoji’s, stop words so that the non-textual content not
pertinent to the analysis is identified and removed.

3) Sentiment Analysis: Once the data is cleansed it’s ready for classification, into positive,
negative and neutral tweets. There are various approaches to sentiment analysis like Machine
Learning, Lexicon based and Hybrid approach. Also there are some other approaches like
Natural Language Processing and Nero Linguistic Programming. Machine Learning involves a
training dataset and a testing dataset, where we used the training data and train the classifier
using one of many algorithms like Bayesian Networks, Naïve Bayes classification, Maximum
Entropy, Networks Support Vector Machine. The testing dataset is used to test the classifier for
its accuracy in the tweets. Lexicon approach does not use any training dataset, it makes the use
of an inbuilt dictionary where all words are associated with a human sentiment. The Hybrid
combines the Machine Learning and the Lexicon approach to improve performance of the
sentiment classifier.

4) Presentation Output: sentiment-analysis is a main aim to generate meaningful information of


raw data. After the analysis is complete, we can perform visualizations like creating bar graphs,
Time series and pie charts. Bar graphs can be used to measure the sentiment the tweet as
positive, negative and neutral. Time Series can be used to measure the likes, retweets and
average length of the tweet over a period. Pie charts can be used to analyze the source of the
tweet.

Approaches to sentiment classification


Sentiment Classification can be performed by classifying a feature in favorable as positive, as
well as unfavorable as negative too. Will classified as 3 levels:

1) Document-Level Classification: A document level classification helps in classifying an entire


opinion document into a positive or negative sentiment. When we are providing reviews for
products or services the entire review helps in expressing positive or negative sentiments of the
consumer towards the product.

2) Sentence-Level Classification: Sentence Level Classification involves breaking down a review


into sentences, calculating the polarity for each sentence and then accordingly calculating the
sentiments for each review.

3) Aspect-Level Classification: Aspect Level Classification judges various aspects of the entity
and giving different opinions about different aspects, it does not focus on the language
construction but focuses more on the opinion itself. The classification focuses around breaking
an opinion into sentiment of an opinion and target of opinion.

Twitter Sentiment Analysis


Twitter is a social networking and microblogging service that lets its users post real time
messages, called tweets. Tweets have many unique characteristics, which implicates new
challenges and shape up the means of carrying sentiment analysis on it as compared to other
domains.
The aim while performing sentiment analysis on tweets is basically to classify the tweets in
different sentiment classes accurately. In this field of research, various approaches have evolved,
which propose methods to train a model and then test it to check its efficiency.
Following are some key characteristics of tweets:
Message Length: The maximum length of a Twitter message is 140 characters. This is different
from previous sentiment classification research that focused on classifying longer texts, such as
product and movie reviews.
Writing technique: The occurrence of incorrect spellings and cyber slang in tweets is more often
in comparison with other domains. As the messages are quick and short, people use acronyms,
misspell, and use emoticons and other characters that convey special meanings.
Availability: The amount of data available is immense. More people tweet in the public domain
as compared to Facebook (as Facebook has many privacy settings) thus making data more
readily available. The Twitter API facilitates collection of tweets for training.
Topics: Twitter users post messages about a range of topics unlike other sites which are designed
for a specific topic. This differs from a large fraction of past research, which focused on specific
domains such as movie reviews.
Real time: Blogs are updated at longer intervals of time as blogs characteristically are longer in
nature and writing them takes time. Tweets on the other hand being limited to 140 letters and are
updated very often. This gives a more real time feel and represents the first reactions to events.
Performing sentiment analysis is challenging on Twitter data, as we mentioned earlier. Here we
define the reasons for this:
Limited tweet size: with just 140 characters in hand, compact statements are generated, which
results sparse set of features.
Use of slang: these words are different from English words and it can make an approach
outdated because of the evolutionary use of slangs.
Twitter features: it allows the use of hashtags, user reference and URLs. These require different
processing than other words.
User variety: the users express their opinions in a variety of ways, some using different
language in between, while others using repeated words or symbols to convey an emotion.
All these problems are required to be faced in the preprocessing section. Apart from these, we
face problems in feature extraction with less features in hand and reducing the dimensionality of
features.
Literature Survey

In paper [1], they present a classifier to predict contextual polarity of subjective phrases in a
sentence. Their approach features lexical scoring derived from the Dictionary of Affect in
Language (DAL) and extended through WordNet, allowing us to automatically score the vast
majority of words in our input avoiding the need for manual labeling. They augment lexical
scoring with n-gram analysis to capture the effect of context. They combine DAL scores with
syntactic constituents and then extract ngrams of constituents from all sentences. They also use
the polarity of all syntactic constituents within the sentence as features. Their results show
significant improvement over a majority class baseline as well as a more difficult baseline
consisting of lexical n-grams.

In this paper [2], we propose an approach to automatically detect sentiments on Twitter messages
(tweets) that explores some characteristics of how tweets are written and meta-information of the
words that compose these messages. Moreover, we leverage sources of noisy labels as our
training data. These noisy labels were provided by a few sentiment detection websites over
twitter data. In our experiments, we show that since our features are able to capture a more
abstract representation of tweets, our solution is more effective than previous ones and also more
robust regarding biased and noisy data, which is the kind of data provided by these sources.

In this paper [3], the author states that Microblogs as a new textual domain offer a unique
proposition for sentiment analysis. Their short document length suggests any sentiment they
contain is compact and explicit. However, this short length coupled with their noisy nature can
pose difficulties for standard machine learning document representations. In this work we
examine the hypothesis that it is easier to classify the sentiment in these short form documents
than in longer form documents. Surprisingly, we find classifying sentiment in microblogs easier
than in blogs and make a number of observations pertaining to the challenge of supervised
learning for sentiment analysis in microblogs.
In this paper [4], they demonstrate that it is possible to perform automatic sentiment
classification in the very noisy domain of customer feedback data. We show that by using large
feature vectors in combination with feature reduction, we can train linear support vector
machines that achieve high classification accuracy on data that present classification challenges
even for a human annotator. We also show that, surprisingly, the addition of deep linguistic
analysis features to a set of surface level word n-gram features contributes consistently to
classification accuracy in this domain.

According to [5], identifying sentiments (the affective parts of opinions) is a challenging


problem. We present a system that, given a topic, automatically finds the people who hold
opinions about that topic and the sentiment of each opinion. The system contains a module for
determining word sentiment and another for combining sentiments within a sentence. We
experiment with various models of classifying and combining sentiment at word and sentence
levels, with promising results.
System Analysis
Existing System
People often express their views freely on social media about what they feel about
the Indian society and the politicians that claim that Indian cities are safe for
women. On social media websites people can freely Express their view point and
women can share their experiences where they have faced abuse harassment or
where we would have fight back against the abuse harassment that was imposed on
them . The tweets about safety of women and stories of standing up against abuse
harassment further motivates other women data on the same social media website
or application like Twitter. Other women share these messages and tweets which
further motivates other 5 men or 10 women to stand up and raise a voice against
people who have made Indian cities and unsafe place for the women. In the recent
years a large number of people have been attracted towards social media platforms
like Facebook, . It is a common practice to extract the information from the data
that is available on social networking through procedures of data extraction, data
analysis and data interpretation methods. The accuracy of the Twitter analysis and
prediction can be obtained by the use of behavioral analysis on the basis of social
networks.

DISADVANTAGES:
1. Twitter and Instagram point and most of the people are using it to express
their emotions and also their opinions about what they think about the Indian
cities and Indian society.
2. There are several method of sentiment that can be categorized like machine
learning hybrid and lexicon-based learning.
3. Also there are another categorization Janta presented with categories of
statistical, knowledge-based and age wise differentiation approaches

PROPOSED SYSTEM:
Women have the right to the city which means that they can gofreely whenever
they want whether it be too an Educational Institute, or any other place women
want to go. But women feel that they are unsafe in places like malls, shopping
malls on their way to their job location because of the several unknown Eyes body
shaming and harassing these women point Safety or lack of concrete consequences
in the life of women isthe main reason of harassment of girls. There are instances
when the harassment of girls was done by their neighbourswhile they were on the
way to school or there was a lack of safety that created a sense of fear in the minds
of small girls who throughout their lifetime suffer due to that one instance that
happened in their lives where they were forced to do something unacceptable or
was abusely harassed by one of their own neighbor or any other unknown person.
Safest cities approach women safety from a perspective ofwomen rights to the
affect the city without fear of violence or abuse harassment. Rather than imposing
restrictions on women that society usually imposes it is the duty of society to
imprecise the need of protection of women and also recognizes that women and
girls also have a right same as men have to be safe in the City.
ADVANTAGES:
1. Analysis of twitter texts collection also includes the name of people and
name of women who stand up against abuse harassment and unethical
behavior of men in Indian cities which make them uncomfortable to walk
freely.
2. The data setthat was obtained through Twitter about the status of women
safety in Indian society

Non-Functional Requirements:

Maintainability

Maintainability is the ease with which a product can be maintained in order to:

 Correct defects or their cause,


 Repair or replace faulty or worn-out components without having to replace still working
parts,
 Prevent unexpected working condition,
 Maximize a product's useful life,
 Maximize efficiency,
 Reliability and safety,
 Meet new requirements,
 Make future maintenance easier, or
 Cope with a changed environment.
Portability

Software portability may involve

 Transferring installed program files to another computer of basically the same


architecture.
 Reinstalling a program from distribution files on another computer of basically the same
architecture.
 Building executable programs for different platforms from source code; this is what is
usually understood by "porting".

Usability
The primary notion of usability is that an object designed with a generalized users' psychology
and physiology in mind is, for example:
 More efficient to use—takes less time to accomplish a particular task
 Easier to learn—operation can be learned by observing the object
 More satisfying to use

Reliability

The objectives of reliability engineering, in decreasing order of priority, are

 To apply engineering knowledge and specialist techniques to prevent or to reduce the


likelihood or frequency of failures.
 To identify and correct the causes of failures that do occur despite the efforts to prevent
them.
 To determine ways of coping with failures that do occur, if their causes have not been
corrected.
 To apply methods for estimating the likely reliability of new designs, and for analysing
reliability data.

Consistent uptime
The new system will be able to stay up and running at least 98% of the time. Any downtime
would be due to maintenance or upgrades. This downtime also includes any potential
failures/crashes.

Load and concurrency


The system must be able to serve at least two thousand users concurrently without crashing.

Dealing with large quantities of data


The developed system will have to deal with large quantities of data and a large number of users
accessing the data at once. The large quantity of data includes timetable information and data
retrieved from the database by many users at the same time.

Familiar Interface
The new system will have an interface that shares some of the feel of the old system so that users
who are familiar with the old system will not have trouble adjusting to the new system.
Real-time Feedback
The new registration system should display the student’s timetable and show the changes made
to it in real-time as the student adds and drops courses.

Focused Layout
The new system will reduce the potential for confusion by having a focused layout. This means
that it will display information that is relevant to the current task and conversely, leave out
irrelevant information.

Web Accessibility
The new system will be compatible with screen readers to assist the visually impaired. This
means that screen readers should interpret the displayed text into speech and should not output
anything that does not correspond to displayed text. It is also important that the colours are
designed so that colour-blind people can still distinguish changes in content.

Effective Recovery
The system must effectively recover from a crash within ten minutes. Effective recovery means
that the data is still in a consistent state accurate to 1 minute before the system crashes when the
system returns.

SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:

 System : Pentium Dual Core.


 Hard Disk : 120 GB.
 Input Devices : Keyboard, Mouse
 Ram : 1 GB
SOFTWARE REQUIREMENTS:

 Operating system : Windows 7.


 Coding Language : JAVA/J2EE
 Tool : Tomcat server, Java JDK, Notepad ++
 Database : MYSQL

Feasibility Study
What is Feasibility Study?
Feasibility study is the process of determination of whether or not a project is worth doing.
Feasibility studies are undertaken within tight time constraints and normally culminate in a
written and oral feasibility report. The contents and recommendations of this feasibility study
helped us as a sound basis for deciding how to precede the project. It helped in taking decisions
such as which software to use, hardware combinations, etc. The following is the process diagram
for feasibility analysis. In the diagram, the feasibility analysis starts with the user set of
requirements. With this, the existing system is also observed. The next step is to check for the
deficiencies in the existing system. By evaluating the above points a fresh idea is conceived to
define and quantify the required goals. The user consent is very important for the new plan.
Along with, for implementing the new system, the ability of the organization is also checked.
Besides that, a set of alternatives and their feasibility is also considered in case of any failure in
the proposed System. Thus, feasibility study is an important part in software development.
WORKING CURRENT SYSTEM USERS CONSESUS

DEFICIENCES IN
USER CURRENT SYSTEM
STATED
REQUIREMENTS ANALYZE TO FIND DEFINE AND
DEFICIENCES QUANTIFY GOALS

REVISION BASED ON FEASIBILITY

CONSTRAINTS ON RESOURCES

EVALUATE FIND BROAD


FEASIBILITY OF ALTERNATIVE
ALTERNATES SOLUTION
PROPOSED FEASIBILITY
ALTERNATIVES
ALTERNATIVES

Figure: PROCESS DIAGRAM FOR FEASIBILITY ANALYSIS

1. Technical Feasibility: -

Technical feasibility determines whether the work for the project can be done with the existing
equipment, software technology and available personnel. Technical feasibility is concerned with
specifying equipment and software that will satisfy the user requirement.

This project is feasible on technical remarks also, as the proposed system is more beneficiary in
terms of having a sound proof system with new technical components installed on the system.
The proposed system can run on any machines supporting Windows and Internet services and
works on the best software and hardware that had been used while designing the system so it
would be feasible in all technical terms of feasibility.

Technical Feasibility Addresses Three Major Issues: -


Is the proposed Technology or Solution Practical?

The technologies used are matured enough so that they can be applied to our problems. The
practicality of the solution we have developed is proved with the use of the technologies we have
chosen. The technologies such as JAVA (JSP, Servlet), JavaScript and the compatible H/Ws are
so familiar with the today’s knowledge based industry that anyone can easily be compatible to
the proposed environment.
Do we currently possess the necessary technology?

We first make sure that whether the required technologies are available to us or nor. If they are
available then we must ask if we have the capacity. For instance, “Will our current Printer be
able to handle the new reports and forms required of a new system?

Do we possess the necessary Technical Expertise and is the Schedule reasonable?

This consideration of technical feasibility is often forgotten during feasibility analysis. We may
have the technology, but that doesn’t mean we have the skills required to properly apply that
technology. As far as our project is concerned we have the necessary expertise so that the
proposed solution can be made feasible.

2. Economical Feasibility: -

Economical feasibility determines whether there are sufficient benefits in creating to make the
cost acceptable, or is the cost of the system too high. As this signifies cost benefit analysis and
savings. On the behalf of the cost-benefit analysis, the proposed system is feasible and is
economical regarding its pre-assumed cost for making a system. During the economical
feasibility test we maintained the balance between the Operational and Economical feasibilities,
as the two were the conflicting. For example the solution that provides the best operational
impact for the end-users may also be the most expensive and, therefore, the least economically
feasible. We classified the costs of Online Counselling according to the phase in which they
occur. As we know that the system development costs are usually one-time costs that will not
recur after the project has been completed. For calculating the Development costs we evaluated
certain cost categories viz.

• Personnel costs
• Computer usage
• Training
• Supply and equipments costs
• Cost of any new computer equipments and software.

In order to test whether the Proposed System is cost-effective or not we evaluated it through
three techniques viz.
 Return on Investment

 Net Present value

3. Operational Feasibility: -

Operational feasibility criteria measure the urgency of the problem (survey and study phases) or
the acceptability of a solution (selection, acquisition and design phases). Operational feasibility
is the measure of how well a proposed system solves the problems, and takes advantage of the
opportunities identified during scope definition and how it satisfies the requirements identified in
the requirements analysis phase of system development.

The operational feasibility assessment focuses on the degree to which the proposed development
project fits in with the existing business environment and objectives with regard to development
schedule, delivery date, corporate culture and existing business processes.

To ensure success, desired operational outcomes must be imparted during design and
development.
SYSTEM DESIGN
System Architecture:
DATA FLOW DIAGRAM:

1. The DFD is also called as bubble chart. It is a simple graphical formalism


that can be used to represent a system in terms of input data to the system,
various processing carried out on this data, and the output data is generated
by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It
is used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with
the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that
depicts information flow and the transformations that are applied as data
moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a
system at any level of abstraction. DFD may be partitioned into levels that
represent increasing information flow and functional detail.
Flow Chart: User

Flow Chart: Admin


UML DIAGRAMS
UML stands for Unified Modeling Language. UML is a standardized
general-purpose modeling language in the field of object-oriented software
engineering. The standard is managed, and was created by, the Object
Management Group.
The goal is for UML to become a common language for creating models of
object oriented computer software. In its current form UML is comprised of two
major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have
proven successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software
and the software development process. The UML uses mostly graphical notations
to express the design of software projects.

GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that
they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.

USE CASE DIAGRAM:


A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms
of actors, their goals (represented as use cases), and any dependencies between
those use cases. The main purpose of a use case diagram is to show what system
functions are performed for which actor. Roles of the actors in the system can be
depicted.
CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modeling Language


(UML) is a type of static structure diagram that describes the structure of a system
by showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.
SEQUENCE DIAGRAM:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is
a construct of a Message Sequence Chart. Sequence diagrams are sometimes called
event diagrams, event scenarios, and timing diagrams.
IMPLEMENTATION
Admin Server

In this module, the Admin has to login by using valid user name and
password. After login successful he can perform some operations such as
View All Friends Req and Res,View All User Tweet Blogs,Add Filter
Details, View Negative Sentiment,View Positive Sentiment,View Women
Safety Results,View Tweet Score Results.

User

In this module, there are n numbers of users are present. User should register
before performing any operations. Once user registers, their details will be
stored to the database. After registration successful, he has to login by using
authorized user name and password. Once Login is successful user can
perform some operations like View Your Profile, Search Friend & Find
Friend Request, View All My Friends, Create Tweet Blog, View All My
Tweet Blogs ,View All My Friends Tweet Blogs
Testing

There are different methods that can be used for software testing. This chapter briefly describes
the methods available.

Black-Box Testing
The technique of testing without having any knowledge of the interior workings of the
application is called black-box testing. The tester is oblivious to the system architecture and
does not have access to the source code. Typically, while performing a black-box test, a tester
will interact with the system's user interface by providing inputs and examining outputs without
knowing how and where the inputs are worked upon.

The following table lists the advantages and disadvantages of black-box testing.

Advantages Disadvantages

Well suited and efficient for large code segments. Limited coverage, since only a
selected number of test scenarios is
actually performed.

Code access is not required. Inefficient testing, due to the fact


that the tester only has limited
knowledge about an application.

Clearly separates user's perspective from the Blind coverage, since the tester
developer's perspective through visibly defined cannot target specific code segments
roles. or errorprone areas.

Large numbers of moderately skilled testers can test The test cases are difficult to design.
the application with no knowledge of
implementation, programming language, or
operating systems.
White-Box Testing
White-box testing is the detailed investigation of internal logic and structure of the code. White-
box testing is also called glass testing or open-box testing. In order to perform white-
box testing on an application, a tester needs to know the internal workings of the code.

The tester needs to have a look inside the source code and find out which unit/chunk of the code
is behaving inappropriately.

The following table lists the advantages and disadvantages of white-box testing.

Advantages Disadvantages

As the tester has knowledge of the source Due to the fact that a skilled tester is needed
code, it becomes very easy to find out to perform white-box testing, the costs are
which type of data can help in testing the increased.
application effectively.

It helps in optimizing the code. Sometimes it is impossible to look into every


nook and corner to find out hidden errors that
may create problems, as many paths will go
untested.

Extra lines of code can be removed which It is difficult to maintain white-box testing,
can bring in hidden defects. as it requires specialized tools like code
analyzers and debugging tools.

Due to the tester's knowledge about the


code, maximum coverage is attained
during test scenario writing.
Grey-Box Testing
Grey-box testing is a technique to test the application with having a limited knowledge of the
internal workings of an application. In software testing, the phrase the more you know, the
better carries a lot of weight while testing an application.

Mastering the domain of a system always gives the tester an edge over someone with limited
domain knowledge. Unlike black-box testing, where the tester only tests the application's user
interface; in grey-box testing, the tester has access to design documents and the database.
Having this knowledge, a tester can prepare better test data and test scenarios while making a
test plan.

Advantages Disadvantages

Offers combined benefits of black-box and Since the access to source code is not
white-box testing wherever possible. available, the ability to go over the code
and test coverage is limited.

Grey box testers don't rely on the source The tests can be redundant if the software
code; instead they rely on interface definition designer has already run a test case.
and functional specifications.

Based on the limited information available, a Testing every possible input stream is
grey-box tester can design excellent test unrealistic because it would take an
scenarios especially around communication unreasonable amount of time; therefore,
protocols and data type handling. many program paths will go untested.

The test is done from the point of view of the


user and not the designer.

A Comparison of Testing Methods


The following table lists the points that differentiate black-box testing, grey-box testing, and
white-box testing.
Black-Box Testing Grey-Box Testing White-Box Testing

The internal workings of an The tester has limited Tester has full knowledge
application need not be knowledge of the internal of the internal workings of
known. workings of the application. the application.

Also known as closed-box Also known as translucent Also known as clear-box


testing, data-driven testing, testing, as the tester has testing, structural testing,
or functional testing. limited knowledge of the or code-based testing.
insides of the application.

Performed by end-users and Performed by end-users and Normally done by testers


also by testers and also by testers and developers. and developers.
developers.

Testing is based on external Testing is done on the basis of Internal workings are fully
expectations - Internal high-level database diagrams known and the tester can
behavior of the application and data flow diagrams. design test data
is unknown. accordingly.

It is exhaustive and the least Partly time-consuming and The most exhaustive and
time-consuming. exhaustive. time-consuming type of
testing.

Not suited for algorithm Not suited for algorithm Suited for algorithm
testing. testing. testing.

This can only be done by Data domains and internal Data domains and internal
trial-and-error method. boundaries can be tested, if boundaries can be better
known. tested.
Levels of Testing

There are different levels during the process of testing. In this chapter, a brief description is
provided about these levels.

Levels of testing include different methodologies that can be used while conducting software
testing. The main levels of software testing are −

 Functional Testing
 Non-functional Testing

Functional Testing
This is a type of black-box testing that is based on the specifications of the software that is to be
tested. The application is tested by providing input and then the results are examined that need
to conform to the functionality it was intended for. Functional testing of a software is conducted
on a complete, integrated system to evaluate the system's compliance with its specified
requirements.

There are five steps that are involved while testing an application for functionality.

Step Description
s

I The determination of the functionality that the intended application is meant to


perform.

II The creation of test data based on the specifications of the application.

III The output based on the test data and the specifications of the application.

IV The writing of test scenarios and the execution of test cases.


V The comparison of actual and expected results based on the executed test cases.

An effective testing practice will see the above steps applied to the testing policies of every
organization and hence it will make sure that the organization maintains the strictest of
standards when it comes to software quality.

Unit Testing
This type of testing is performed by developers before the setup is handed over to the testing
team to formally execute the test cases. Unit testing is performed by the respective developers
on the individual units of source code assigned areas. The developers use test data that is
different from the test data of the quality assurance team.

The goal of unit testing is to isolate each part of the program and show that individual parts are
correct in terms of requirements and functionality.

Limitations of Unit Testing


Testing cannot catch each and every bug in an application. It is impossible to evaluate every
execution path in every software application. The same is the case with unit testing.

There is a limit to the number of scenarios and test data that a developer can use to verify a
source code. After having exhausted all the options, there is no choice but to stop unit testing
and merge the code segment with other units.

Integration Testing
Integration testing is defined as the testing of combined parts of an application to determine if
they function correctly. Integration testing can be done in two ways: Bottom-up integration
testing and Top-down integration testing.

Sr.No Integration Testing Method


.

1 Bottom-up integration
This testing begins with unit testing, followed by tests of progressively higher-
level combinations of units called modules or builds.

2 Top-down integration

In this testing, the highest-level modules are tested first and progressively, lower-
level modules are tested thereafter.

In a comprehensive software development environment, bottom-up testing is usually done first,


followed by top-down testing. The process concludes with multiple tests of the complete
application, preferably in scenarios designed to mimic actual situations.

System Testing
System testing tests the system as a whole. Once all the components are integrated, the
application as a whole is tested rigorously to see that it meets the specified Quality Standards.
This type of testing is performed by a specialized testing team.

System testing is important because of the following reasons −

 System testing is the first step in the Software Development Life Cycle, where the
application is tested as a whole.
 The application is tested thoroughly to verify that it meets the functional and technical
specifications.
 The application is tested in an environment that is very close to the production
environment where the application will be deployed.
 System testing enables us to test, verify, and validate both the business requirements as
well as the application architecture.

Regression Testing
Whenever a change in a software application is made, it is quite possible that other areas within
the application have been affected by this change. Regression testing is performed to verify that
a fixed bug hasn't resulted in another functionality or business rule violation. The intent of
regression testing is to ensure that a change, such as a bug fix should not result in another fault
being uncovered in the application.

Regression testing is important because of the following reasons −

 Minimize the gaps in testing when an application with changes made has to be tested.
 Testing the new changes to verify that the changes made did not affect any other area of
the application.
 Mitigates risks when regression testing is performed on the application.
 Test coverage is increased without compromising timelines.
 Increase speed to market the product.

Acceptance Testing
This is arguably the most important type of testing, as it is conducted by the Quality Assurance
Team who will gauge whether the application meets the intended specifications and satisfies the
client’s requirement. The QA team will have a set of pre-written scenarios and test cases that
will be used to test the application.

More ideas will be shared about the application and more tests can be performed on it to gauge
its accuracy and the reasons why the project was initiated. Acceptance tests are not only
intended to point out simple spelling mistakes, cosmetic errors, or interface gaps, but also to
point out any bugs in the application that will result in system crashes or major errors in the
application.
CONCLUSION

Throughout the project we have discussed machine learning algorithms that can
help us to organize and analyze the huge amount of Twitter data obtained including
millions of tweets and text messages shared every day. These machine learning
algorithms are very effective and useful when it comes to analyzing of large
amount of dataincluding the SPC algorithm and linear algebraic Factor Model
approaches which help to further categorize the data into meaningful groups.
Support vector machines is yet another form of machine learning algorithm that is
very popular in extracting Useful information from the Twitter and get an idea
about the status of women safety in Indian cities.
REFERENCES
1] Agarwal, Apoorv, Fadi Biadsy, and Kathleen R. Mckeown. "Contextual phrase level polarity
analysis using lexical affect scoring and syntactic n-grams." Proceedings of the 12th Conference
of the European Chapter of the Association for Computational Linguistics. Association for
Computational Linguistics, 2009.

[2] Barbosa, Luciano, and Junlan Feng. "Robust sentiment detection on twitter from biased and
noisy data." Proceedings of the 23rd international conference on computational linguistics:
posters. Association for Computational Linguistics, 2010.
[3] Bermingham, Adam, and Alan F. Smeaton. "Classifying sentiment in microblogs: is brevity
an advantage?." Proceedings of the 19th ACM international conference on Information and
knowledge management. ACM, 2010.

[4] Gamon, Michael. "Sentiment classification on customer feedback data: noisy data, large
feature vectors, and the role of linguistic analysis." Proceedings of the 20th international
conference on Computational Linguistics. Association for Computational Linguistics, 2004.

[5] Kim, Soo-Min, and Eduard Hovy. "Determining the sentiment of opinions." Proceedings of
the 20th international conference on Computational Linguistics. Association for Computational
Linguistics, 2004.

[6] Klein, Dan, and Christopher D. Manning. "Accurate unlexicalized parsing." Proceedings of
the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association
for Computational Linguistics, 2003..

[7] Charniak, Eugene, and Mark Johnson. "Coarse-to-fine n-best parsing and MaxEnt
discriminative reranking." Proceedings of the 43rd annual meeting on association for
computational linguistics. Association for Computational Linguistics, 2005.

[8] Gupta, B., Negi, M., Vishwakarma, K., Rawat, G., & Badhani, P. (2017). Study of Twitter
sentiment analysis using machine learning algorithms on Python. International Journal of
Computer Applications, 165(9), 0975-8887.

[9] Sahayak, V., Shete, V., & Pathan, A. (2015). Sentiment analysis on twitter data. International
Journal of Innovative Research in Advanced Engineering (IJIRAE), 2(1), 178-183.

[10] Mamgain, N., Mehta, E., Mittal, A., & Bhatt, G. (2016, March). Sentiment analysis of top
colleges in India using Twitter data. In Computational Techniques in Information and
Communication Technology.

You might also like