0% found this document useful (1 vote)
274 views19 pages

Understanding Rumor Detection on Social Media

This document discusses research into detecting rumors from social media. It aims to develop a rumor classification system with four components: rumor detection, rumor tracking, rumor stance classification, and rumor veracity classification. The methodology involves collecting tweets from Twitter using an information harvester, preprocessing the data, extracting features, and using machine learning techniques like sentiment analysis to detect trends and characteristics of rumors. The workflow involves collecting tweets from Twitter, storing them in a MongoDB database, preprocessing the data, and performing analysis on a development platform. The research also discusses enhancing the existing workflow and testing models on additional datasets.

Uploaded by

chirag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
274 views19 pages

Understanding Rumor Detection on Social Media

This document discusses research into detecting rumors from social media. It aims to develop a rumor classification system with four components: rumor detection, rumor tracking, rumor stance classification, and rumor veracity classification. The methodology involves collecting tweets from Twitter using an information harvester, preprocessing the data, extracting features, and using machine learning techniques like sentiment analysis to detect trends and characteristics of rumors. The workflow involves collecting tweets from Twitter, storing them in a MongoDB database, preprocessing the data, and performing analysis on a development platform. The research also discusses enhancing the existing workflow and testing models on additional datasets.

Uploaded by

chirag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

RUMOUR

DETECTION FROM
SOCIAL MEDIA

BY:
SOHAM NANDY
SAHIL MAHAJAN
VARDAAN BAJAJ
RUMOURS
?
What is Rumor?

 A rumour is a story or piece of information that may or may not be


true, but that people are talking about.

 Two types of rumours :-


1) long-standing rumours
2) newly emerging rumours
Problem Definition

 Rumors, fortunately or unfortunately affects us all and in many ways then we care to
remember.

 Despite the increasing use of social media platforms for information and news gathering,
its unmoderated nature often leads to the emergence and spread of rumours.

 At the same time, the openness of social media platforms provides opportunities to study
how users share and discuss rumours, and to explore how to automatically assess their
veracity, using natural language processing and data mining techniques.
Problem Definition

 We provide an overview of research into social media rumours with the ultimate goal of
developing a rumour classification system that consists of four components:
1. Rumour detection,
2. Rumour tracking,
3. Rumour stance classification, and
4. Rumour veracity classification.

 We delve into the approaches presented in the scientific literature for the development of
each of these four components.
Aim
 This project aims to investigate the characteristics of rumours found on online social
networks. The characteristics could be: Size and frequency of messages, message
propagation through the social network, and sentence structure of the messages.

 This study seeks to identify the key traits of rumours on online social networks such as
Twitter. The importance of automating the identification of rumours is growing ever-
increasingly important, given the rise of the internet’s popularity as a source of news,
and the ever-growing amount of information on the internet.
Methodology
Project Planning:-
The project plan must comprehensively account for all tasks required to be completed for the
project, accounting for the research direction of the project and the dependencies between
tasks.

Information Harvester Development:-


The information harvester must be able to collect tweets from Twitter in automated and
consistent manner.

Literature Review:-
Literature review will be undertaken for both academic fields of Computer Science and Social
Sciences, to gain a mix of insights of how rumours are detected.
Methodology
Feature Selection & Engineering:-
This project will be exploring datasets gathered through the collection of tweets from the Twitter
API. Investigative work will be performed to engineer additional features based on existing tweet
data, such as tweet type and tweet text. This section will also include manual labelling of tweets to
indicate the tweet’s sentiment (eg. is news, is rumour), which will be used as the target label for
classification purposes.

Sentiment Analysis using Machine Learning Techniques:-


Work will be performed to engineer more features via the usage of sentiment libraries. Lastly,
Machine Learning classifiers will be used to detect key trends in the dataset.

Testing, Results, and Discussion:-


The testing phase will report characteristics of the datasets collected and elaborate on the impact
of the findings generated.

Full System Integration:-


The full system integration seeks to provide an easy-to-use web user interface for the user to
easily discover insights from the datasets and experiment results generated.
Workflow
Workflow
The general data workflow consists of the following 4 elements :-

[Link]
Twitter is a social network platform where participants can make posts and interact with fellow
participants using hashtags, quote retweets, retweets, and comments. The datasets used in this project are
based on tweets collected from Twitter.

[Link] Harvester
The Information Harvester collects tweets from Twitter based on search queries by the user.

[Link] Database
The MongoDB Database stores tweets from the information harvester. Tweets are put through a data
cleaning process and are imported into the MongoDB Database.

[Link] & Development Platform


The Analysis & Development Platform is where all further in-depth analysis and work are performed.
TWITTER
2nd
Largest Social Networking Site

1,300,000,000
Twitter Accounts
5,000,000
Tweets per Day
INFORMATIONHARVESTER
» Automated 24/7 tweet collection
» Networkoptimizations
» Duplicate tweet reduction
» Gzipped archives for 90% space savings
DATAPREPROCESSING
1. Decompress archives
2. Remove tweet duplicates
3. Label tweets with tweet types
4. Generate tweet relationship data
Future Scope
As this is only a preliminary and broad study on rumours on online social networks, improvements can be
done in the following ways:-

1) The existing workflow can be enhanced in the following ways:

- Leveraging on GPU acceleration to speed up calculations


- Utilizing a distributed database for greater scale-up capability
- Real-time importing and visualization of data

2) Testing can be done in the following ways:

- Evaluation of existing models on public datasets (eg. News datasets)


- Evaluation of existing models on other types of texts (eg. Articles)
Some Snapshots of the App
Questions?
Thank You

You might also like