0% found this document useful (0 votes)
61 views87 pages

Team6 Final FNC

The project report focuses on 'Fake News Classification using NLP' as a solution to combat the spread of misinformation prevalent on social media. It outlines the methodology of collecting and processing news data, training classifiers, and providing users with a frontend interface for news classification. The report emphasizes the importance of using machine learning algorithms to enhance the accuracy of fake news detection and promote informed decision-making.

Uploaded by

Nikhitha girigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views87 pages

Team6 Final FNC

The project report focuses on 'Fake News Classification using NLP' as a solution to combat the spread of misinformation prevalent on social media. It outlines the methodology of collecting and processing news data, training classifiers, and providing users with a frontend interface for news classification. The report emphasizes the importance of using machine learning algorithms to enhance the accuracy of fake news detection and promote informed decision-making.

Uploaded by

Nikhitha girigi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A PROJECT REPORT ON

FAKE NEWS CLASSIFICATION USING NLP

Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, KAKINADA
For Partial Fulfilment of Award of the Degree of

BACHELOR OF TECHNOLOGY

Submitted By

G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)

Under the esteemed guidance of


Ms. G .Hemasudharani
AssistantProfessor,Department of CSE-Artificial Intelligence and Machine Learning

DEPARTMENT OF CSE-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING


S.R.K INSTITUTE OF TECHNOLOGY
(Approved By AICTE, New Delhi & Affiliated To JNTU, Kakinada)
(An Iso 9001:2015 Certified Institution & Accredited by NAAC With "A" Grade)
Enikepadu, Vijayawada-521108.

APRIL 2024
S.R.K INSTITUTE OF TECHNOLOGY
(Approved By AICTE, New Delhi & Affiliated To JNTU, Kakinada)
(An Iso 9001:2015 Certified Institution & Accredited by NAAC With "A" Grade)
Enikepadu, Vijayawada-521108.
DEPARTMENT OF CSE-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

CERTIFICATE

This is to certify that this project report entitled “FAKE NEWS


CLASSIFICATION USING NLP” is the bonafide work of G.Nikhitha
(20X41A4217), N S.Praneetha Gandikota (20X41A4237), S. Sri Divijendra
Kumar(20X41A4248),MD. Abdul Naveed(20X41A4235) in partial fulfillment of
the requirements for the award of the graduate degree in BACHELOR OF
TECHNOLOGY during the academic year 2023-2024. This Work has carried out
under our supervision and guidance.

(Ms.G.Hemasudharani) (Dr.A.Radhika)
Signature of the Guide Signature of the HOD

Signature of the External Examiner


DECLARATION

We G.Nikhitha, N S. Praneetha Gandikota, S. Sri Divijendra Kumar, Md.Abdul Naveed hereby


declare that the project report entitled “FAKE NEWS CLASSIFICATION USING NLP” is
an original work done in the Department of CSE (Artificial Intelligence-Machine Learning),
SRK Institute of Technology, Enikepadu, Vijayawada, during the academic year 2023-2024, in
partial fulfillment for the award of the Degree of Bachelor of Technology in CSE (Artificial
Intelligence- Machine Learning). We assure you that this project is not submitted to any other
College or University.

PROJECT ASSOCIATES SIGNATURES

G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)
ACKNOWLEDGEMENT
Firstly, we would like to convey our heart full thanks to the Almighty for the blessings on us to
carry out this project work without any disruption.

We are extremely thankful to Ms.G.Hemasudharani, our guide throughout the project.


We also thank her for most independence and freedom throughout the given to us during various
phases of the project.

We are also thankful for our project coordinator Dr.D.Anusha, for her valuable guidance
which helped us to bring this project successfully.

We are very much grateful to Dr.D.Anusha. H.O.D of C.S.E-Artificial


Intelligence&Machine Learning Department, for her valuable guidance which helped us to bring
out this project successfully. Her wise approach made us learn the minute details of the subject. Her
matured and patient guidance paved away for completing our project with a sense of satisfaction
and pleasure.

We are greatly thankful to our principal Dr. M. Ekambaram Naidu for his kind support
and facilities provided at our campus which helped us to bring out this project successfully.

Finally, we would like to convey our heart full thanks to our Technical Staff, for their
guidance and support in every step of this project. We convey our sincere thanks to all the faculty
and friends who directly or indirectly helped us with the successful completion of this project.

G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)
CONTENTS
TITLE PAGE. NO
ABSTRACT
List of Figures
Chapter 1: INTRODUCTION 1
1.1 : Overview 1
1.2 : About the Project 1
1.3 : Purpose 1
1.4 : Scope 1

Chapter 2: LITERATURE SURVEY 2


Chapter 3: SYSTEM ANALYSIS
3.1 : Existing System 11
3.1.1 : Disadvantage of Existing System 11
3.2 : Proposed System 11
3.2.1: Advantages of Proposed System 12
3.2.2: Methodology 12
3.2.3:System Architecture 13
3.2.4: Algorithms 14
3.2.4.1: NLP 15
3.2.4.2:Flask 18
3.2.5: Datasets 19
3.2.6: Modules 21
3.3: Feasibility study 22
3.3.1 : Economic Feasibility 22
3.3.2 : Operational Feasibility 23
3.3.3 : Technical Feasibility 23
Chapter 4: SYSTEM SPECIFICATIONS
4.1 : Hardware Requirement 25
4.2 : Software Requirement 25
Chapter 5: SYSTEM DESIGN
5.1: UML Diagram 26
i.
5.1.1 : Use Case Diagram 27
5.1.2 :Class Diagram 28
5.1.3 : Sequence Diagram 29
5.1.4 : Data Flow Diagram 30
5.1.5 : State Diagram 31
5.1.6 : Activity Diagram 32
5.1.7 : Collaboration Diagram 33
Chapter 6: SYSTEM IMPLEMENTATION
6.1: Technology 35
6.1.1: Python Installation 35
6.2: Sample Code 36
6.3: System Tests 55
6.4: Types of Tests 55
6.4.1: Unit testing 55
6.4.2: Integrational Testing 56
6.4.3: Function Testing 56
6.4.4: System Testing 56
6.4.5: White Box Testing 56
6.4.6: Black Box Testing 57
6.4.7: Acceptance Testing 57
6.4.8: Testing Results 57
6.5: Testing Methodologies 57
6.5.1: Unit Testing 57
6.5.2: Integration Testing 57
6.5.3: Use Acceptance Testing 59
6.5.4: Output and Validation Testing 59

Chapter 7: SCREENSHOTS 60

Chapter 8: CONCLUSION 62
Chapter 9: Future Scope 63
REFERENCES 65
ii.
List of Figures

FIGURE NO NAME PAGE NO


3.1 Fake News Detector (Existing System) 11
3.2 Architecture 14
3.3 TF-IDF Vectorizer 16
3.4 Training Dataset 20
3.5 Testing Dataset 20
3.6 Implementation 22
5.1 Representation of Use Case Diagram 28
5.2 Representation of Class Diagram 29
5.3 Representation of Sequence Diagram 30
5.4 Representation of Dataflow Diagram 31
5.5 Representation of State Diagram 32
5.6 Representation of Activity Diagram 33
5.7 Representation of Collabration Diagram 34
7.1 Front end page 60
7.2 Input URL 61
7.3 Output 61

iii.
ABSTRACT

In today’s modern world, "fake news" has been a major concern, spreading like wildfire
through many platforms. This phenomenon not only undermines the credibility of information but
also misleads society. Nowadays, social media is the greatest means by which fake news spreads all
over the place. This can cause many problems such as defamation of people and spreading news in
favour of specific individuals. Fake news often targets the most prominent, powerful, and influential
people in society, aiming to tarnish their reputation. The escalating impact of fake news knows no
bounds. Fake news is often biased, favouring a single person or a section of people in society for
their personal benefits. To mitigate these challenges and promote transparency, there is a need to
reduce the spread of fake news. Introducing a "Fake News Classifier using NLP" offers a promising
solution to combat this issue. By using machine learning algorithms, this classifier can effectively
identify misleading information as fake news, thereby contributing to awareness in society and
reducing losses.

KEYWORDS: Natural Processing Language, TF-IDF, Flask, Classification, MultinomialNB,


Accuracy.

iv.
Chapter 1
INTRODUCTION

1.1 Overview
This project starts with collecting labeled news data and proceeds to process the text,
extract features, and train a classifier. It generates a frontend interface where users can input news
links and receive classification results. This adaptation maintains the system's flow while
implementing NLP-based fake news detection.

1.2 ABOUT THE PROJECT


In this application, NLP techniques are utilized to combat fake news. The system collects
labeled news data, processes text, extracts features, and trains a classifier. Through a frontend
interface, users input news links and receive classification results, contributing to the fight against
misinformation. This approach promotes informed decision-making and safeguards against the
spread of false information, enhancing the integrity of digital discourse.

1.3 Purpose:
The main purpose of this application is to determine whether news articles are fake or real by
using unique links.

1.4 Scope:
The scope for this project lies in addressing the growing concern of misinformation by
providing a reliable tool for fake news detection. With the increasing reliance on digital information,
there is a significant demand for robust systems that can accurately classify news articles and combat
the spread of false information.

1
Chapter 2
LITERATURE REVIEW

1. Fake news detection: a systematic literature review machine


learning algorithms and datasets.
The research delves into the pervasive issue of fake news in contemporary
society, driven by its widespread dissemination facilitated by digital communication
technologies, notably through social media platforms. Detecting and combatting fake news is
of paramount importance to mitigate its detrimental impacts on political, economic, and
social realms. To address this challenge, the study adopts an exploratory, qualitative
approach, employing a systematic review protocol to identify relevant literature. Through a
rigorous search process across multiple databases and applying specific exclusion criteria, 61
articles are selected for analysis. These articles serve as the basis for examining the accuracy
of machine learning algorithms and the datasets utilized in fake news detection. Notable
algorithms identified include the Stacking Method, Bidirectional Recurrent Neural Network
(BiRNN), and Convolutional Neural Network (CNN), achieving impressive accuracies of up
to 99.9%. However, the majority of studies rely on datasets from controlled environments
like Kaggle, indicating a gap in real-time social network settings where fake news
proliferates most significantly. The research underscores the importance of expanding
beyond political news, embracing hybrid detection methods, and utilizing diverse datasets to
enhance accuracy and relevance in identifying fake news. Overall, the study contributes to
understanding the state-of-the-art in fake news detection, shedding light on effective
strategies and avenues for future research in combating misinformation.

2. Fake news detection on social media using K-nearest neighbor


classifier by Kesarwan et al .
The phenomenon of fake news has emerged as a significant threat to
democracy, journalism, and the economy, eroding public trust in institutions and influencing
public opinion. With its roots dating back to pre-internet times, fake news has been

2
exacerbated by the advent of social media, making it easier for false information to spread
rapidly. The consequences of fake news are profound, impacting democratic processes,
public health, and societal well-being. Addressing this challenge has become imperative,
leading researchers to explore various approaches, including fact-checking, media literacy
programs, and machine learning techniques. Among these, machine learning algorithms
show promise in detecting fake news by analyzing large datasets of news articles. In this
context, the study focuses on utilizing Logistic Regression fused with Natural Language
Processing techniques for fake news detection, achieving high accuracy rates. Additionally,
the research reviews notable contributions in the field, highlighting advancements in fake
news detection methodologies and models, such as Multichannel Deep Neural Networks and
BERT models. Methodologically, the study involves data cleaning, feature extraction using
TF-IDF vectorization, and training of machine learning models. The developed model is
deployed through a web application, enabling users to classify news content as real or fake.
Overall, the research contributes to the ongoing efforts to combat fake news proliferation
through innovative technological solutions and empirical evaluations.

3. Puneeth surya anem , Impact and Exposure of fake news on Social


Media.
The paper provides a comprehensive examination of the phenomenon of fake news,
its impact on society, and proposed solutions for its detection. It begins by framing fake
news within historical contexts of propaganda and sensationalist journalism, emphasizing its
manipulative nature and potential to distort public perception. With the advent of social
media, the dissemination of fake news has accelerated, posing significant challenges to the
integrity of information ecosystems and democratic processes.The study delves into the
detrimental effects of fake news, citing examples such as its role in influencing major
political events such as the 2016 US presidential election. It explores how false information
propagated on social media platforms can sway public opinion, undermine trust in traditional
media sources, and even impact economic markets. The paper underscores the urgency of
addressing this issue, given its wide-ranging implications for society.

In response to the challenge of fake news detection, the paper presents a model
leveraging n-gram analysis and machine learning classification techniques. It outlines the
data preprocessing steps, including the removal of noise such as punctuation and stop words,
to prepare the dataset for analysis. Feature extraction methods like Term Frequency-Inverted
Document Frequency (TF-IDF) are employed to convert textual data into numerical

3
representations, facilitating machine learning algorithms' training and evaluation.The
experimental section of the paper details the model's performance using various classifiers
and feature extraction techniques. Results demonstrate the effectiveness of the proposed
approach, with the LSVM classifier achieving the highest accuracy of 92%. The study
underscores the importance of feature selection and classifier optimization in enhancing
detection accuracy.

The paper concludes by emphasizing the critical importance of combatting fake


news and suggests avenues for future research, including exploring alternative classifiers and
expanding datasets for analysis. It calls for collaborative efforts between researchers,
policymakers, and tech companies to develop robust strategies for identifying and mitigating
the spread of false information in the digital age.

4. Ritik Patel , News Classification using Natural Language Processing.


The outlines the importance of effectively categorizing news articles in a social
network to provide users with relevant and trustworthy information. It highlights the
challenges of distinguishing between real and fake news, emphasizing the potential societal
impact of false information. The paper discusses previous research efforts in detecting fake
news using machine learning approaches and acknowledges the evolving nature of news
characteristics, posing new challenges for classification. It also touches upon the rapid
dissemination of news through online platforms and the need for accurate information
sharing. The background section provides an overview of machine learning techniques such
as supervised learning, including algorithms like Random Forest, Naive Bayes, Decision
Trees, and Logistic Regression. It also discusses Natural Language Processing (NLP)
concepts like TF-IDF and NLTK, along with evaluation metrics such as confusion matrices
and classification reports. Finally, the conclusion underscores the ongoing efforts to develop
reliable methods for detecting fake news and the importance of critical thinking in navigating
online information sources.

5. Fake News Detection using NLP, Machine Learning and Deep


Learning.
Before going into making the code for the following data it becomes
very important to search about how research has been done in the field that we want to work

4
upon. We have analysed quite a few papers that had done work upon fake news detection.
Many types of model were trained which had many issues and had obtained many results
which provided a lot of help in our project. The researchers have applied a lot of algorithms
ranging from linear regression to deep learning algorithms. All the papers have first argued
about how fake news has been troubling the world since a long time which has resulted in a
lot of chaos including death in many cases. They have talked about the importance of
classification of such news and how it becomes important to remove such propaganda to
prevent treating misinformation as news.

The research papers themselves have analysed several papers before


proceeding on with their project to get an idea what goes wrong and how to add novelty to
their project. Their papers discussed how the conversion of text to numeric values have been
done where different methods of vectorization techniques have been used ranging from TF-
IDF to Bag of Words(BOW). The process of data cleaning has been discussed where how
data has been made into a proper dataset has been discussed and the use of NLP has been
shown. Different type of algorithms has been discussed by their research papers like SVM,
Random Forrest to name a few.

The use of deep learning algorithms like CNN have been shown and the
final accuracies have been shown where importance to data classification has been given.
Now coming to the research papers, we can observe most of them have picked the dataset
from LIAR dataset. Some other datasets are also included for example combined corpus by
Junaed Younus Khan , Md. Tawkat Islam Khondaker , Anindya Iqbal and Sadia Afroz.
There has been a proper classification that has been done for the type of data they are getting
for example visual based and user based. This has been discussed in detail by Syed Ishfaq
Manzoor, Dr Jimmy Singla and Dr Nikita in their research paper. For data cleaning different
methods have been employed to remove all the unnecessary IP and the URL addresses.
Whitespaces have been removed using stemming. TFI-IDF has been used extensively for the
vectorization techniques by most of the papers. The above two works have been done by
Junaed Younus Khan. BOW has been used in the research paper by Dr Singla. Another
important point about data pointed out in the research papers was the issue of bias in data
aligning with the models. Next all the 3 research papers have done the feature extraction
where empath tool has been used for classifying the type of news as violent, misleading etc.
Another important method used here is Lexical and Sentiment Feature extraction has been
done where word count, word length has been used as lexical while positive and negative has

5
been marked as lexical. This works also has been done by the research paper made by
Junaed Younus Khan. Next traditional models have been used such as SVM, Linear
Regression, Decision Trees, Naïve Bayes and K-NN model by professors at Dhaka
University. XG Boost and Random Forrest were the new algorithms which were
implemented by professors at LPU. The paper made by Harsh Khatter argued about SVM
being used to solve the problem and proposed a model combining of News Aggregator,
Authenticator and Suggestion recommendation. Further deep learning algorithms have been
implemented for the better learning of the data so that better accuracies are obtained.

The paper by Dr Khatter implemented simple neural Networks for the same while the
paper by Anindya Iqbal discussed about the CNN model and used several new deep learning
algorithms like Hierarchical Attention Networks(HAN) and Convolutional HAN. Three
types of LSTM were also used which includes LSTM,C-LSTM and Bi-LSTM. LSTM is
basically Linguistic Inquiry and Word Count (LIWC) dictionary which includes a word
classification and count tool. The results were divided into two parts by professors at Dhaka
University were one analysed before the neural networks while the other talked about after
that. The best accuracy was reported by Naïve Bayes with 94 percent after using n-gram
(bigram TF-IDF) features. For the paper by Harsh Khatter it reported Naïve Bayes to be the
best with a accuracy of 93.5 percent and the paper by professors at LPU argued about XG
Boost being the best. In conclusion all papers argued that perfect accuracy cannot be
obtained and scope of future work was there.

6. Fake News Detection Using a Logistic Regression Model and


Natural Language Processing ,Johnson Adeleke Adeyiga .
The abstract encapsulates a study focused on tackling the pervasive issue of fake
news proliferation using machine learning methodologies. Leveraging a comprehensive
dataset of around 20,000 labeled news articles obtained from Kaggle, the research
employed various classifiers including logistic regression, KNN, Passive Aggressive, and
Naïve Bayes. Through meticulous text preprocessing techniques such as TF-IDF
vectorization, the study aimed to extract meaningful features from the news articles to
facilitate accurate classification. Additionally, a user-friendly website was developed
utilizing Flask, HTML, CSS, and JavaScript, enabling users to input news content for real-
time classification.

6
Evaluation of the classifiers was conducted using key metrics including accuracy,
precision, recall, and F1 score. Notably, the logistic regression model emerged as the most
effective classifier, showcasing superior performance compared to the alternatives. This
highlights the significance of logistic regression in distinguishing between real and fake
news articles, thereby underscoring its potential utility in combating misinformation.

The study's findings contribute to the ongoing efforts to address the challenges
posed by fake news dissemination, emphasizing the pivotal role of machine learning in
information verification. By demonstrating the efficacy of machine learning techniques in
fake news detection, the research underscores the importance of continued exploration and
development in this domain. Ultimately, the study underscores the potential of machine
learning algorithms to serve as valuable tools in promoting media literacy and combating
the spread of misinformation in the digital age.

7. TI-CNN: Convolutional Neural Networks for Fake News


Detection,Yang Yang
In their paper titled "TI-CNN: Convolutional Neural Networks for Fake News
Detection," published in June 2018, Yang et al. propose a novel approach for detecting fake
news using Convolutional Neural Networks (CNNs). CNNs are a type of deep learning
architecture commonly used for image recognition, but here they are applied to text data for
the task of fake news detection.The authors begin by acknowledging the growing concern
over the proliferation of fake news and the need for effective detection methods. They
highlight the challenges in this area, such as the rapid spread of misinformation through
social media platforms and the difficulty in manually verifying the vast amount of
information shared online.

To address these challenges, Yang et al. introduce the TI-CNN framework, which
stands for Textual Information-based Convolutional Neural Networks. This framework
leverages both textual information and external knowledge to improve the accuracy of fake
news detection. The CNN model is trained on a dataset of news articles labeled as either
fake or genuine, allowing it to learn patterns and features indicative of fake news.One key
aspect of the TI-CNN framework is its utilization of external knowledge sources, such as
knowledge graphs and semantic embeddings, to enhance the model's understanding of the
textual information. By incorporating external knowledge, the model can capture deeper

7
semantic relationships between words and phrases, thus improving its ability to discriminate
between fake and genuine news articles.

Experimental results presented in the paper demonstrate the effectiveness of the


proposed TI-CNN framework compared to baseline methods. The CNN model achieves
promising results in terms of accuracy, demonstrating its potential for practical
applications in combating the spread of fake news online. Overall, the paper contributes to
the growing body of research aimed at developing automated solutions for fake news
detection using machine learning techniques.

8. Detecting Fake News in Social Media Networks,Monther Aldwairi,


Ali Alwahedi.
The paper proposes a tool designed to detect and eliminate web pages containing
misinformation. Users need to download and install the tool, which is expected to be
compatible with commonly used browsers. The tool analyzes the syntactical structure of
links retrieved from search engine results, flagging those with potentially misleading words
or excessive hyperbole. It also considers the number of words in headlines, with longer titles
indicating potential clickbait. Additionally, it monitors the use of punctuation marks in
headlines and examines bounce rates of individual sites to determine veracity. After
executing the algorithm, search engine results highlight potentially misleading links,
allowing users to block them. The methodology involves collecting URLs from social media
sites likely to host fake news or clickbait articles, computing attributes from web page titles
and content, and extracting features such as keywords, title characteristics, and user behavior.
Through this approach, users can potentially reduce the presence of clickbait in their search
results.

9. Fake news detection using deep learning models: A novel approach,


S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar
The paper titled "Fake News Detection Using Deep Learning Models: A Novel
Approach," authored by S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar,
and published in the Transactions on Emerging Telecommunications Technologies in
2019, presents a pioneering methodology for combating the proliferation of fake news
leveraging deep learning techniques.

8
The authors address the burgeoning issue of fake news dissemination, recognizing
the critical need for robust detection mechanisms amid the digital information age.
Through meticulous experimentation and analysis, they propose a novel approach
harnessing the power of deep learning models to discern between genuine and fabricated
news articles.Central to their methodology is the utilization of advanced deep learning
architectures, adept at extracting intricate patterns and features from textual data. By
training these models on large datasets comprising both authentic and deceptive articles,
they aim to equip the system with the discriminative prowess necessary for accurate
classification.

The paper delves into the intricacies of model architecture, training procedures,
and performance evaluation metrics employed to gauge the efficacy of the proposed
approach. Results from empirical studies demonstrate promising outcomes, showcasing
the potential of deep learning in bolstering the fight against fake news
dissemination.Moreover, the authors underscore the significance of their findings in real-
world applications, advocating for the integration of their methodology into existing news
verification frameworks to enhance credibility and trustworthiness in digital media
landscapes. This paper represents a significant contribution to the burgeoning field of fake
news detection, offering a robust framework underpinned by deep learning methodologies
to mitigate the adverse effects of misinformation in the digital age.

10. Fake News Detection using Machine Learning , P. Kulkarni, S.


Karwande, R. Keskar, P. Kale, and S. Iyer
The paper titled "Fake News Detection Using Machine Learning," authored by P.
Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, and published in the ITM Web of
Conferences in 2021, delves into the realm of fake news detection, leveraging the
capabilities of machine learning algorithms. In response to the escalating challenges posed
by the dissemination of misinformation, the authors propose a methodology grounded in
machine learning techniques to discern between authentic and deceptive news content.
Central to their approach is the utilization of a diverse array of machine learning algorithms,
carefully trained on annotated datasets encompassing genuine and fabricated news articles.
Through meticulous experimentation and analysis, the authors elucidate the efficacy of
various machine learning models in distinguishing between factual and misleading

9
information. The paper not only explores the intricacies of model selection and training but
also delves into feature engineering and performance evaluation metrics employed to assess
the robustness of the proposed approach. Results from empirical studies showcase promising
outcomes, underscoring the potential of machine learning in fortifying the defenses against
fake news propagation. Furthermore, the authors advocate for the integration of their
methodology into mainstream news verification frameworks, emphasizing the critical role of
machine learning in fostering information integrity and trustworthiness in digital media
landscapes. Overall, this paper represents a significant stride in the ongoing battle against
misinformation, offering a valuable framework rooted in machine learning principles to
safeguard the veracity of online information.

10
Chapter 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

There are various models which exist for Real &Fake news Detection. The most
prevalent system consists of a model that detects fake news based on keywords as well as the
headlines, simultaneously.Passive Aggressive detects fake news using keyword analysis and
headline,addressing topic-specific tendencies and author behavior and it contains the
sentiment analysis.

FIG 3.1:- Fake News Detector(Existing System)

3.1.1 DISADVANTAGES OF EXISTING SYSTEM

 Limited contextual understanding


 Vulnerability to Manipulation
 Challenges in Multilingual Settings
 Sentiment Analysis Limitations
 Lack of Adaptability
 Keyword Dependency

3.2 PROPOSED SYSTEM

In this project we are going to make use of Natural Language Techniques to

11
overcome the widespread of false news on the internet.In this project we make use techniques
to determine how the MultinomialNB algorithm works on the given clip of information which
can be given as input to the system .

The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.The
basis for the project is to develop an classifier using article links and article context. This helps
admin to get information about news article .

3.2.1 ADVANTAGES OF PROPOSED SYSTEM

 Improved Accuracy: By leveraging advanced NLP techniques and machine


learning algorithms, the system can provide more accurate assessments of news
reliability compared to simple keyword-based methods.

 Adaptability: The system can be trained on a diverse dataset of news articles,


allowing it to adapt to evolving patterns of misinformation and effectively detect
new types of fake news.

 Scalability: With proper implementation, the system can process large volumes
of news articles efficiently, making it suitable for real-time monitoring of online
news sources.

 User-Friendly Interface: Providing users with a platform to input news clips and
receive reliability assessments enhances transparency and usability, fostering
trust in the system's output.

 Continuous Improvement: By periodically updating the training dataset and


refining the algorithm, the system can continuously improve its performance
over time, ensuring its effectiveness in combating fake news.

3.2.2 METHODOLOGY

In this paper we are going to make use of Natural Language Techniques to overcome the
widespread of false news on the internet. Here we make use techniques to determine how the
Multinomial algorithm works on the given clip of information which can be given as input to

12
the system.The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.

We choose the MultinomialNB Classifier because, it performs satisfactory with data


sets with high dimensionality and it’s mainly particular classifier when comes to the text

classification.Multinomial Naive Bayes assumes a feature vector where each element


represents the number of times it appears (or, very often, its frequency).The dataset we’ll use
for this project- we’ll call it news.csv. This dataset has a shape of 7796×4. The first column
identifies the news, the second and third are the title and text, and the fourth column has labels
denoting whether the news is REAL or FAKE.

3.2.3 SYSTEM ARCHITECTURE

The system architecture is a application means to classify whether the article is fake
or real using Natural Language Processing techniques and machine learning.We create a user-
friendly web interface and there the users can give input url of news article to check whether it
is fake or real. When comes to backend system built with using flask, and it is a python web
framework. Flask web server handles requests from interface and then it processes them and
returns the classification results to the user.

When it comes to NLP modules we use mainly newspaper3k and pre-processing.


Firstly, we discuss the concept called Newspaper3k .It is used for web scraping and extracts
the news article content from provided URL. Secondly, we discuss the concept called Pre-
processing .When the newspaper3k extract the content and gives to the pre-processing and it
undergoes the steps like tokenization, stopword removal, and possibly stemming or
lemmatization to prepare it for analysis.

After, prep-processing the text data is converted into numerical features that can be
understood by the MultinomialNB model. Some techniques like word embedding or TF-IDF
may be employed for this purpose. The pre-processed features are boarded into a machine
learning model trained on a labeled dataset of articles. There several machine learning models
are present for classification but we choose the MultinomialNB classifier and the model
predicts whether the article is fake or real based on the extracted features. The system contains
NLP techniques and machine learning models to automatically classify news articles, thereby
users identify the misleading or false information.The database stores structured data required

13
by the system, such as user information, news articles, prediction results, and system logs.

FIG 3.2:- Architecture

3.2.4 ALGORITHMS

The fake news detection algorithm is a multi-step process designed to identify


misinformation within news articles. Initially, the algorithm collects and preprocesses a
dataset of labeled news articles, preparing the text data for analysis by removing stop words,
tokenizing, and possibly stemming or lemmatizing words. Following this, the algorithm
extracts features from the preprocessed text, typically converting it into numerical
representations using techniques like Bag-of-Words or TF-IDF. These features serve as
input for machine learning models, which are trained on a portion of the dataset while being
validated and tested on another portion. Evaluation metrics such as accuracy, precision,
recall, and F1-score assess the model's performance. When presented with a new news
article, the algorithm preprocesses the text using the same steps as during training, extracts
features, and feeds them into the trained model to predict whether the article is real or fake.
Continuous refinement of the algorithm, incorporating new data and feedback, is essential
for maintaining its effectiveness over time as fake news tactics evolve.
Multinomial Naive Bayes is a machine learning algorithm that is based on
bayes’ theroem.It is a probilistic classifier to calculate the probability distribution of given
data which in the form of text,which makes it suited for data and the features are represent
discrete frequencies or count of events in various natural language processing(NLP)

14
tasks.The probability mass function (PMF) of the Multinomial distribution is used to
model the likelihood of observing a specific set of word counts in a document. It is given
by:

3.2.4.1 Natural Language Processing (NLP)


In the context of using Multinomial Naive Bayes (MultinomialNB) for fake news
classification, Natural Language Processing (NLP) techniques are used to process and
analyze the text data (news articles) before feeding it into the classifier. Here's how NLP is
typically used in conjunction with MultinomialNB:

Text Preprocessing:
- Before training the classifier, the text data undergoes preprocessing steps such as
tokenization, stopword removal, and lowercasing.
- Tokenization involves breaking down the text into individual words or tokens.
- Stopword removal eliminates common words that do not carry significant meaning (e.g.,
"the", "is", "and").

Feature Representation:
- After preprocessing, the text data is converted into numerical features that can be
understood by the machine learning algorithm.
- One common approach is to use techniques like TF-IDF (Term Frequency-Inverse
Document Frequency) or count vectorization to represent the frequency of each word or n-
gram in the document.
- TF-IDF assigns weights to words based on their frequency in the document and their
rarity across all documents in the corpus.
- Vectorization is a technique used to converting input data from its raw format (i.e. text )
into vectors of real numbers . TF-IDF or Term Frequency–Inverse Document Frequency,
may be a numerical statistic that’s intended to reflect
how important a word is to a document. Although it’s another frequency-based method.
-TF stands for Term Frequency. It will be understood as a normalized frequency score. it's
calculated via the
subsequent formula:

15
So one can imagine that this number will always stay ≤ 1, thus we now judge how frequent a
word is in the context of all of the words in a document.
- IDF stands for Inverse Document Frequency, but before we go into IDF, we must
make sense of DF – Document Frequency. It’s given by the following formula:

DF tells us about the proportion of documents that contain a certain word. So what’s IDF?
It’s the reciprocal of the Document Frequency, and the final IDF score comes out of the
following formula:

Just as we discussed above, the intuition behind it's that the more common a word is across
all documents, the
lesser its importance is for this document.
A logarithm is taken to dampen the effect of IDF within the final calculation.
The final TF-IDF score comes dead set be:

FIG 3.3:TF-IDF Vectorizer

16
Vocabulary Building:
- A vocabulary is built based on the unique words or n-grams present in the training data.
- Each word or n-gram becomes a feature in the feature vector, and its index in the vector
corresponds to its position in the vocabulary.

Model Training:
- The preprocessed and vectorized text data is used to train the MultinomialNB classifier.
- During training, the classifier learns the probability distribution of each feature (word or
n-gram) given the class label (fake or real) using maximum likelihood estimation.The
classifier calculates the probabilities of each word or n-gram occurring in a document given
its class.

Classification:
- To classify a new news article, the same preprocessing steps are applied to the article's
text.The article's text is then converted into a feature vector using the same vocabulary built
during training.
- The MultinomialNB classifier calculates the probability of the article belonging to each
class (fake or real) based on the observed features.The class with the highest probability is
predicted as the final classification for the article.

NLP techniques are integral to the process of feature extraction and representation in fake
news classification using MultinomialNB. By processing and vectorizing the text data
appropriately, NLP enables the classifier to effectively learn patterns and make accurate
predictions about the authenticity of news articles.

NLP Libraries
1. Natural Language Toolkit(NLTK)
NLTK is a vital library supports tasks like classification, stemming, tagging, parsing,
semantic reasoning, and tokenization in Python. It's basically our main tool for language
processing and machine learning. Today it is an academic foundation for Python developers
who are dipping their toes during this field (and machine learning).
The library was developed by Steven Bird and Edward Loper at the University of
Pennsylvania and played a key role in breakthrough NLP research. Many universities around
the globe now use NLTK, Python libraries, and other tools in their courses. This library is

17
pretty versatile, but we must admit that it’s also quite difficult to use for language Processing
with Python.
NLTK is rather slow and doesn’t match the strain of quick-paced production usage.
the educational curve is steep, but developers can profit of resources like this beneficial book
to be told more about the concepts behind the language processing tasks this toolkit supports.
2. SpaCy
SpaCy may be a relatively young library was designed for production usage. That’s
why it’s most more accessible than other Python NLP libraries like NLTK. SpaCy offers the
fastest syntactic parser available on the market today. Moreover, since the toolkit is written
in Python, it’s also really speedy and efficient.
However, no tool is ideal. compared to the libraries we covered up to now, spaCy
supports the tiniest number of languages (seven). However, the growing popularity of
machine learning, NLP, and spaCy as a key library implies that the tool might start
supporting more programming languages soon.
3. Scikit-learn
This handy NLP library provides developers with a good range of algorithms for
building machine learning models. It offers many functions for using the bag-of-words
method of making features to tackle text classification problems. The strength of this library
is that the intuitive classes methods. Also, scikit-learn has a wonderful documentation that
helps developers make the foremost of its features.
However, the library doesn't use neural networks for text pre-processing. So if you
would like to hold out more complex pre-processing tasks like POS tagging for your text
corpora, it's better to use other NLP libraries so return to scikit-learn for building your
models.
3.2.4.2 : FLASK:-
In the context of the fake news classification system described earlier, Flask is used
as the web framework to build the backend of the application. Here's how Flask is related to
the system:
Web Interface:
- Flask provides the infrastructure for creating a user-friendly web interface where users
can interact with the fake news classification system.
- Users input the URL of a news article through the web interface, and Flask handles the
HTTP request.

18
Routing:
- Flask defines routes to handle different URLs and HTTP methods. For example, the `'/'`
route renders the main HTML template, while the `'/predict'` route processes the URL input
and makes predictions.
- Routes are defined using decorators like `@app.route('/')` and `@app.route('/predict')`.
Request Handling:
- Flask's request object (`request`) is used to access data submitted in the HTTP request. In
this case, the URL of the news article is extracted from the request data using
`request.get_data(as_text=True)`.
Template Rendering:
- Flask integrates with Jinja2 templating engine to render HTML templates dynamically.
Templates are used to generate the web pages that users interact with.
- The `render_template` function is used to render HTML templates and pass data to them.
Integration with NLP Module and Machine Learning Model:
- Flask integrates with the NLP module and machine learning model responsible for
classifying news articles.
- When a URL is submitted through the web interface, Flask invokes the NLP module to
extract the news content from the URL, preprocess it, and pass it to the machine learning
model for classification.
Response Handling:
- Flask handles the classification result returned by the machine learning model and sends
it back to the web interface for display.
- The classification result is typically rendered within the HTML template using Jinja2
templating syntax.
Flask acts as the backbone of the fake news classification system, providing the
infrastructure for handling HTTP requests, routing, template rendering, and integrating with
the NLP module and machine learning model. It enables the creation of a user-friendly web
interface through which users can input news articles and receive classification results in
real-time.

3.2.5 Datasets:

19
FIG 3.4:- Training and Testing dataset

FIG 3.5:- Testing Dataset

20
3.2.6 Modules

Data is obtained from a CSV file ('news.csv') containing text and corresponding labels
indicating whether each article is authentic or fake.The pandas library is used to load the
dataset into a DataFrame.

Text data and labels are extracted from the DataFrame and stored in separate variables (X and y,
respectively).
A.)Data Splitting:
The dataset is split into training and testing sets using the train_test_split function from scikit-
learn .80% of the data is used for training, and the remaining 20% is allocated for testing.
B.)Feature Engineering:
Text data is transformed into numerical feature vectors using the TF-IDF (Term Frequency-
Inverse Document Frequency) vectorization technique.Stop words (common words with little
semantic value) are removed during vectorization to improve model performance.
C.)Model Selection and Training:
A pipeline is created using scikit-learn's Pipeline module, which sequentially applies TF-IDF
vectorization and the Multinomial Naive Bayes classifier.The Multinomial Naive Bayes
algorithm is chosen due to its effectiveness in text classification tasks and its suitability for
handling sparse data.The pipeline is trained on the training data using the fit method.
D.)Model Evaluation:
The trained model is used to make predictions on the test data.Classification performance is
evaluated using standard metrics such as accuracy, precision, recall, and F1-score.The scikit-
learn classification_report function is employed to generate a comprehensive report of these
metrics.Confusion matrices are generated using the confusion_matrix function to visualize the
distribution of true positive, true negative, false positive, and false negative predictions.
E.)Model Serialization:
The trained model is serialized using the pickle module and saved to a file
('model.pickle').Serialization allows for the model to be easily stored and reloaded for future use
without needing to retrain it.
F.)Performance Assessment:
The accuracy of the model is calculated by comparing the predicted labels with the actual labels

21
of the test data.The overall accuracy score is printed to assess the performance of the model in
classifying fake news articles.
- Reflect on the effectiveness of the implemented models and suggest future research
directions.
- Emphasize the importance of continued efforts in developing robust fake news detection
systems.

FIG 3.6:Implementation

3.3 FEASIBILITY STUDY


Preliminary investigation examines project feasibility, the likelihood the system will be
useful to the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old running
system. All system is feasible if they are unlimited resources and infinite time. There are
aspects in the feasibility study portion of the preliminary investigation:

 Economical Feasibility
 Operational Feasibility
 Technical Feasibility

3.3.1 ECONOMIC FEASIBILITY

22
Assessing the economic feasibility of a fake news classification project involves a
comprehensive analysis of costs, benefits, and risks. Development costs encompass data
collection, preprocessing, model development, and personnel expenses. Infrastructure
costs include hardware, software, and ongoing maintenance. On the benefits side, potential
savings from reduced misinformation-related damages, revenue opportunities, and
intangible benefits like societal well-being must be considered. Financial metrics such as
Net Present Value (NPV), Return on Investment (ROI), and Payback Period help quantify
the project's economic viability. Risk assessment identifies potential obstacles and informs
mitigation strategies. Ultimately, a thorough economic feasibility analysis guides decision-
making, ensuring that the project aligns with organizational goals and offers a positive
return on investment.

3.3.2 OPERATIONAL FEASIBILITY


Operational feasibility analysis is a crucial step in evaluating the
practicality of implementing a fake news classification system within an organization's
operational framework. This assessment involves scrutinizing various aspects, including
technical infrastructure, human resources readiness, organizational culture, and legal
considerations. It examines the organization's ability to support the system technically,
ensuring compatibility with existing technologies and scalability to handle increasing data
volumes. Moreover, it assesses the readiness of personnel to adopt and utilize the system
effectively, necessitating adequate training and support mechanisms. Organizational
impacts, such as stakeholder attitudes and potential conflicts with existing workflows, are
also evaluated to ensure smooth integration. Additionally, legal and regulatory compliance
regarding data privacy and intellectual property rights are paramount. By conducting a
thorough operational feasibility analysis, decision-makers can identify potential challenges,
mitigate risks, and make informed decisions to ensure successful implementation and
sustained operation of the fake news classification system.

3.3.3 TECHNICAL FEASIBILITY


Technical feasibility assessment is crucial for determining if a proposed fake news
classification system can be developed and implemented effectively within the
organization's technical framework. This evaluation involves scrutinizing the
compatibility and availability of necessary technologies for tasks such as data

23
preprocessing, model development, and deployment. Additionally, the quality and
sufficiency of available data sources must be examined to ensure they meet the
requirements for training accurate models. It's imperative to assess various machine
learning algorithms and techniques to identify the most suitable approaches for fake news
classification. Consideration of computational and resource requirements, including
hardware infrastructure and scalability, is necessary to ensure efficient handling of large
volumes of data. Furthermore, evaluating integration with existing organizational systems
and workflows helps identify potential challenges and dependencies. By conducting a
thorough technical feasibility analysis, decision-makers can make informed choices and
address potential technological hurdles to ensure successful implementation of the fake
news classification system.

24
Chapter 4
SYSTEM SPECIFICATIONS

4.1 HARDWARE REQUIREMENTS

 Hard Disk : 512 GB.


 Processor : Intel(i5)
 Mouse : Optical Mouse.
 Ram : 8 GB.

4.2SOFTWARE REQUIREMENTS

 Operating system : Windows , Linux


 Coding Language : PYTHON
 Framework : Flask

25
Chapter 5
SYSTEM DESIGN

5.1 UML DIAGRAMS


UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-oriented
computer software. In its current form UML is comprised of two major components: a
Meta-model and a notation. In the future, some form of method or process may also be
added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven successful
in the modeling of large and complex systems.
The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS
The Primary goals in the design of the UML are as follows:

1. Provide users a ready-to-use, expressive visual modeling Language so that


they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.

4. Provide a formal basis for understanding the modeling language.


5. Encourage the growth of OO tools market.

26
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.

5.1.1 USE CASE DIAGRAM


A use case diagram in the Unified Modelling Language (UML) is a
type of behavioural diagram defined by and created from a Use-case analysis. Its purpose
is to present a graphical overview of the functionality provided by a system in terms of
actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.
User: Represents any individual or entity interacting with the system, such as a journalist,
fact-checker, or general user seeking information.
Administrator: Manages the system, including configuration, updates, and monitoring.
Submit Article: Users can submit articles to the system for classification.
Classify Article: The system uses NLP algorithms to analyze the content of submitted articles
and classify them as either "Fake" or "Real".
View Article Details: Users can view detailed information about a classified article, including
the classification result and NLP analysis.
Administer System: Administrators can configure system settings, manage user accounts,
monitor system performance, and update NLP models.
Generate Reports: Generate reports based on the analysis of submitted articles, including
statistics on the number of fake vs. real articles, accuracy of classifications, etc.
Provide Feedback: Users can provide feedback on the classification results, helping to
improve the accuracy of the system over time.
Association: Shows the interaction between actors and use cases. For example, users can
submit articles for classification, and administrators can administer the system.
Include: Indicates that certain use cases include other use cases. For example, the "Submit
Article" and "Classify Article" use cases are included in the broader process of "Article
Processing".
Extend: Represents optional or alternative behavior within a use case. For instance, the
"Provide Feedback" use case can extend the "View Article Details" use case, allowing users to
provide feedback after viewing the details of a classified article.

27
FIG:5.1Representation of Use Case Diagram

5.1.2 CLASS DIAGRAM


In the realm of software engineering, the Class Diagram within the Unified
Modeling Language (UML) serves as a foundational tool for illustrating the structure of a
system. This diagram offers a static snapshot, meticulously depicting the various classes
within the system, their inherent attributes, methods or operations they can perform, and
the intricate relationships interlinking these classes. By delineating which class
encapsulates specific information and how they interact, the Class Diagram provides
invaluable insights into the inner workings of the system.
In a practical scenario, let's take the example of a Fake News Detection System.
Within this system, we encounter several key classes, each playing a distinct role. One
such class, the FakeNewsDetector, stands at the forefront, tasked with the critical
responsibility of discerning fraudulent information through the intricate analysis of text
using Natural Language Processing (NLP) techniques. This class boasts a pivotal
connection to an NLP model, facilitating its analytical capabilities,the NLPModel class
emerges, representing the sophisticated algorithmic framework utilized for fake news
detection. Laden with attributes elucidating its model type and the corpus of trained data,
this class embodies the culmination of extensive machine learning endeavors. Equipped
with methods to load the model and conduct in-depth text analysis, the NLPModel class
forms the bedrock of the system's analytical prowess.
As the FakeNewsDetector class interfaces with the NLPModel class, invoking its
analytical capabilities, it yields a profound AnalysisResult. This class encapsulates the
outcome of the NLP analysis, furnishing critical insights such as confidence scores and
classifications. Through meticulous examination, this AnalysisResult class empowers
stakeholders with actionable intelligence, aiding in the identification and mitigation of
fake news proliferation.

28
FIG:5.2 Representation of Class Diagram

5.1.3 SEQUENCE DIAGRAM


In software engineering, the Sequence Diagram in Unified Modeling Language
(UML) serves as a dynamic interaction diagram, showcasing how processes collaborate and
the sequence in which these interactions occur. Derived from Message Sequence Charts, it
visually narrates communication flows between system entities over time. Sometimes known
as Event Diagrams or Timing Diagrams, it encapsulates temporal dynamics and event-driven
interactions. Lifelines represent entities, while messages and activations delineate the flow of
control and data. This diagram isn't static; it accommodates loops, branches, and parallel
activations to explore diverse system behaviors. Functioning as a communication catalyst, it
fosters collaboration among stakeholders, facilitating a common understanding of system
behavior from early design to implementation. As a vital artifact, it guides software
development, ensuring clarity and precision in conceptualizing, analyzing, and communicating
complex system interactions.

29
FIG:5.3 Representation of Sequence Diagram

5.1.4. DATAFLOW DIAGRAM


A data flow diagram (DFD) illustrates the flow of data within a system, showcasing
how information moves between processes, data stores, and external entities. At its core, a
DFD visually represents the processes that transform input data into output data, indicating
the movement of data from one process to another. Typically, it consists of four main
components: processes, data stores, data flows, and external entities. Processes represent
the activities or transformations performed on data, while data stores denote repositories
where data is stored. Data flows depict the movement of data between processes, data
stores, and external entities, illustrating the direction of data flow. External entities are
sources or destinations of data that interact with the system but are not part of it. Overall, a
well-constructed DFD provides a clear and concise overview of how data moves through a
system, facilitating understanding, communication, and analysis of system functionality
and data dependencies.

30
FIG:5.4 Representation of Data Flow Diagram

5.1.5. STATE DIAGRAM


A state diagram, also recognized as a state machine diagram, serves as a graphical
depiction illustrating the multitude of states inherent in a system or object, along with the
transitions linking them. Each state encapsulates a specific condition or mode, while
transitions signify changes induced by events. This visual representation plays a pivotal
role in comprehending the dynamic behavior of systems, enabling seamless design,
analysis, and communication of intricate operational patterns. State diagrams find
extensive application across various domains, including software engineering and control
systems, where they serve as indispensable tools for modeling and managing system
behavior effectively. By providing a clear and intuitive portrayal of states and transitions,
these diagrams empower engineers and stakeholders to discern complex system dynamics,
identify potential issues, and devise robust strategies for system development and
optimization.

31
FIG:5.5 Representation of State Diagram

5.1.6 ACTICITY DIAGRAM


An activity diagram serves as a visual roadmap, delineating the intricate flow of
activities within a system or process, elucidating how actions are sequenced and
orchestrated. Within this graphical representation, each activity serves as a tangible
manifestation of a specific action or task, meticulously arranged to showcase the logical
progression of work. Arrows connecting these activities denote transitions, effectively
illustrating the flow of control from one task to another. Notably, decision points, depicted
by diamond shapes, introduce branching and conditional behavior, enabling the diagram to
capture diverse paths and potential outcomes. Activity diagrams emerge as indispensable
tools for modeling complex processes, offering a holistic view of workflows and
highlighting areas ripe for optimization and automation. Their visual clarity fosters
understanding and communication, empowering stakeholders to navigate the logical
intricacies of system activities with ease, thereby facilitating informed decision-making
and driving efficiency enhancements.

32
FIG:5.6 Representation of Activity Diagram

5.1.7 COLLABORATION DIAGRAM


The collaboration diagram is used to show the relationship between the objects in
a system. Both the sequence and the collaboration diagrams represent the same
information but differently. Instead of showing the flow of messages, it depicts the
architecture of the object residing in the system as it is based on object-oriented
programming. An object consists of several features.
Multiple objects present in the system are connected to each other. The
collaboration diagram, which is also known as a communication diagram, is used to
portray the object's architecture in the system.

33
FIG:5.7 Representation of Collaboration Diagram

34
Chapter 6
SYSTEM IMPLEMENTATON

6.1 :

Technology

6.1.1:Python

Python is a general-purpose interpreted, interactive, object-oriented, and high-


level programming language. An interpreted language, Python has a design philosophy
that emphasizes code readability (notably using whitespace indentation to delimit code
blocks rather than curly brackets or keywords), and a syntax that allows programmers to
express concepts in fewer lines of code than might be used in languages such as C++or
Java. It provides constructs that enable clear programming on both small and large scales.
Python interpreters are available for many operating systems. CPython, the reference
implementation of Python, is open-source software and has a community-based
development model, as do nearly all of its variant implementations. CPython is managed
by the non-profit Python Software Foundation. Python features a dynamic type system
and automatic memory management. It supports multiple programming paradigms,
including object-oriented, imperative, functional and procedural, and has a large and
comprehensive standard library. Python is a popular programming language. It was
created by Guido van Rossum, and released in 1991.

It is used for:

 web development (server-side),


 software development,
 mathematics,
 system scripting.

What can Python do

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
35
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.

Why Python

 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
 Python can be treated procedurally, an object-orientated way or a functional way.

6.2 SAMPLE CODE

Fake_news_detection.py:-

#Importing the libraries


import pandas as pd
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
import pickle

#Importing the cleaned file containing the text and label


news = pd.read_csv('news.csv')
X = news['text']

36
y = news['label']

#Splitting the data into train


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

#Creating a pipeline that first creates bag of words(after applying stopwords)


# & then applies Multinomial Naive Bayes model
pipeline = Pipeline([('tfidf', TfidfVectorizer(stop_words='english')),
('nbmodel', MultinomialNB())])

#Training our data


pipeline.fit(X_train, y_train)

#Predicting the label for the test data


pred = pipeline.predict(X_test)

#Checking the performance of our model


print(classification_report(y_test, pred))
print(confusion_matrix(y_test, pred))

#Serialising the file


with open('model.pickle', 'wb') as handle:
pickle.dump(pipeline, handle, protocol=pickle.HIGHEST_PROTOCOL)
model = pipeline.fit(X_train, y_train)
prediction = model.predict(X_test)
accuracy = accuracy_score(y_test, prediction)
print("Accuracy: {:.2f}%".format(accuracy))

App.py:-
#Importing the Libraries
import numpy as np
from flask import Flask, request,render_template
from flask_cors import CORS
import os

37
import joblib
import pickle
import flask
import os
import newspaper
from newspaper import Article
import urllib

#Loading Flask and assigning the model variable


app = Flask( name )
CORS(app)
app=flask.Flask( name ,template_folder='templates')

with open('model.pickle', 'rb') as handle:


model = pickle.load(handle)

@app.route('/')
def main():
return render_template('main.html')

#Receiving the input url from the user and using Web Scrapping to extract the news content
@app.route('/predict',methods=['GET','POST'])
def predict():
url =request.get_data(as_text=True)[5:]
url = urllib.parse.unquote(url)
article = Article(str(url))
article.download()
article.parse()
article.nlp()
news = article.summary
#Passing the news article to the model and returing whether it is Fake or Real
pred = model.predict([news])
return render_template('main.html', prediction_text='The news is "{}"'.format(pred[0]))

38
if name ==" main ":
port=int(os.environ.get('PORT',5000))
app.run(port=port,debug=True,use_reloader=False)

Main.html:-

<!DOCTYPE html>
<html >
<!--From https://codepen.io/frytyler/pen/EGdtg-->
<head>
<meta charset="UTF-8">
<title>Fake News Detection</title>
<link href='https://fonts.googleapis.com/css?family=Pacifico' rel='stylesheet'
type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Arimo' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Hind:300' rel='stylesheet'
type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300'
rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">

</head>

<body>
<div class="login">
<h1>Predict Fake News</h1>

<!-- Main Input For Receiving Query to our ML -->


<form action="{{ url_for('predict')}}"method="post">
<input type="text" name="news" placeholder="Enter the news url" required="required"
/>
<button type="submit" class="btn btn-primary btn-block btn-large">Predict</button>

</form>

39
<br>
<br>
{{ prediction_text }}

</div>

</body>
</html>

https://fonts.googleapis.com/css?family=Pacifico:-

/* cyrillic-ext */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6K6MmTpA.woff2)
format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6D6MmTpA.woff2)
format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}

40
/* vietnamese */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6I6MmTpA.woff2)
format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6J6MmTpA.woff2)
format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6H6Mk.woff2)
format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

https://fonts.googleapis.com/css?family=Arimo:-

41
/* cyrillic-ext */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDRrBZ
QI.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcBBrBZ
QI.woff2) format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDBrBZ
QI.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {

42
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcAxrBZQ
I.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* hebrew */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcAhrBZQ
I.woff2) format('woff2');
unicode-range: U+0590-05FF, U+200C-2010, U+20AA, U+25CC, U+FB1D-FB4F;
}
/* vietnamese */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDxrBZQ
I.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Arimo';

43
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDhrBZQ
I.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcABrB.w
off2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

https://fonts.googleapis.com/css?family=Hind:300:-

/* devanagari */
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaER2SjQpf.woff2)
format('woff2');
unicode-range: U+0900-097F, U+1CD0-1CF9, U+200C-200D, U+20A8, U+20B9,
U+20F0, U+25CC, U+A830-A839, U+A8E0-A8FF, U+11B00-11B09;
}
/* latin-ext */

44
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaERKSjQpf.woff2)
format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaERySjQ.woff2)
format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300:-

/* cyrillic-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDujMR6WR.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}

45
/* cyrillic */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuHMR6WR.woff2) format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDunMR6WR.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDubMR6WR.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* vietnamese */
@font-face {
font-family: 'Open Sans Condensed';

46
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDurMR6WR.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuvMR6WR.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuXMRw.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

Style.css:-

47
@import url(https://fonts.googleapis.com/css?family=Open+Sans);
.btn { display: inline-block; *display: inline; *zoom: 1; padding: 4px 10px 4px; margin-
bottom: 0; font-size: 13px; line-height: 18px; color: #333333; text-align: center;text-shadow:
0 1px 1px rgba(255, 255, 255, 0.75); vertical-align: middle; background-color: #f5f5f5;
background-image: -moz-linear-gradient(top, #ffffff, #e6e6e6); background-image: -ms-
linear-gradient(top, #ffffff, #e6e6e6); background-image: -webkit-gradient(linear, 0 0, 0
100%, from(#ffffff), to(#e6e6e6)); background-image: -webkit-linear-gradient(top, #ffffff,
#e6e6e6); background-image: -o-linear-gradient(top, #ffffff, #e6e6e6); background-image:
linear-gradient(top, #ffffff, #e6e6e6); background-repeat: repeat-x; filter:
progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff, endColorstr=#e6e6e6,
GradientType=0); border-color: #e6e6e6 #e6e6e6 #e6e6e6; border-color: rgba(0, 0, 0, 0.1)
rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.25); border: 1px solid #e6e6e6; -webkit-border-radius: 4px; -
moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0 rgba(255,
255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255,
255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2),
0 1px 2px rgba(0, 0, 0, 0.05); cursor: pointer; *margin-left: .3em; }
.btn:hover, .btn:active, .btn.active, .btn.disabled, .btn[disabled] { background-color:
#e6e6e6; }
.btn-large { padding: 9px 14px; font-size: 15px; line-height: normal; -webkit-border-radius:
5px; -moz-border-radius: 5px; border-radius: 5px; }
.btn:hover { color: #333333; text-decoration: none; background-color: #e6e6e6; background-
position: 0 -15px; -webkit-transition: background-position 0.1s linear; -moz-transition:
background-position 0.1s linear; -ms-transition: background-position 0.1s linear; -o-
transition: background-position 0.1s linear; transition: background-position 0.1s linear; }
.btn-primary, .btn-primary:hover { text-shadow: 0 -1px 0 rgba(0, 0, 0, 0.25); color: #ffffff; }
.btn-primary.active { color: rgba(255, 255, 255, 0.75); }
.btn-primary { background-color: #4a77d4; background-image: -moz-linear-gradient(top,
#6eb6de, #4a77d4); background-image: -ms-linear-gradient(top, #6eb6de, #4a77d4);
background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#6eb6de), to(#4a77d4));
background-image: -webkit-linear-gradient(top, #6eb6de, #4a77d4); background-image: -o-
linear-gradient(top, #6eb6de, #4a77d4); background-image: linear-gradient(top, #6eb6de,
#4a77d4); background-repeat: repeat-x; filter:
progid:dximagetransform.microsoft.gradient(startColorstr=#6eb6de, endColorstr=#4a77d4,

48
GradientType=0); border: 1px solid #3762bc; text-shadow: 1px 1px 1px rgba(0,0,0,0.4);
box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.5); }
.btn-primary:hover, .btn-primary:active, .btn-primary.active, .btn-primary.disabled, .btn-
primary[disabled] { filter: none; background-color: #4a77d4; }
.btn-block { width: 100%; display:block; }

* { -webkit-box-sizing:border-box; -moz-box-sizing:border-box; -ms-box-sizing:border-box;


-o-box-sizing:border-box; box-sizing:border-box; }

html { width: 100%; height:100%; overflow:hidden; }

body {
width: 100%;
height:100%;
font-family: 'Open Sans', sans-serif;

background: #092756;
color: #fff;
font-size: 18px;
text-align:center;
letter-spacing:1.2px;
background: -moz-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%),-moz-linear-gradient(top, rgba(57,173,219,.25) 0%,
rgba(42,60,87,.4) 100%), -moz-linear-gradient(-45deg, #670d10 0%, #092756 100%);
background: -webkit-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), -webkit-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -webkit-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -o-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), -o-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -o-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -ms-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)

49
10%,rgba(138,114,76,0) 40%), -ms-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -ms-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -webkit-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), linear-gradient(to bottom, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), linear-gradient(135deg, #670d10 0%,#092756 100%);
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#3E1D6D',
endColorstr='#092756',GradientType=1 );

}
.login {
position: absolute;
top: 40%;
left: 50%;
margin: -150px 0 0 -150px;
width:400px; height:400px;
}

.login h1 { color: #fff; text-shadow: 0 0 10px rgba(0,0,0,0.3); letter-spacing:1px; text-


align:center; }

input {
width: 100%;
margin-bottom: 10px;
background: rgba(0,0,0,0.3);
border: none;
outline: none;
padding: 10px;
font-size: 13px;
color: #fff;
text-shadow: 1px 1px 1px rgba(0,0,0,0.3);

border: 1px solid rgba(0,0,0,0.3);

50
border-radius: 4px;

box-shadow: inset 0 -5px 45px rgba(100,100,100,0.2), 0 1px 1px rgba(255,255,255,0.2);


-webkit-transition: box-shadow .5s ease;
-moz-transition: box-shadow .5s ease;
-o-transition: box-shadow .5s ease;
-ms-transition: box-shadow .5s ease;
transition: box-shadow .5s ease;
}
input:focus { box-shadow: inset 0 -5px 45px rgba(100,100,100,0.4), 0 1px 1px
rgba(255,255,255,0.2); }

https://fonts.googleapis.com/css?family=Open+Sans:-

/* cyrillic-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4taVIGxA.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4kaVIGxA.woff2) format('woff2');

51
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4saVIGxA.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4jaVIGxA.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* hebrew */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4iaVIGxA.woff2) format('woff2');
unicode-range: U+0590-05FF, U+200C-2010, U+20AA, U+25CC, U+FB1D-FB4F;
}
/* math */

52
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B5caVIGxA.woff2) format('woff2');
unicode-range: U+0302-0303, U+0305, U+0307-0308, U+0330, U+0391-03A1, U+03A3-
03A9, U+03B1-03C9, U+03D1, U+03D5-03D6, U+03F0-03F1, U+03F4-03F5, U+2034-
2037, U+2057, U+20D0-20DC, U+20E1, U+20E5-20EF, U+2102, U+210A-210E, U+2110-
2112, U+2115, U+2119-211D, U+2124, U+2128, U+212C-212D, U+212F-2131, U+2133-
2138, U+213C-2140, U+2145-2149, U+2190, U+2192, U+2194-21AE, U+21B0-21E5,
U+21F1-21F2, U+21F4-2211, U+2213-2214, U+2216-22FF, U+2308-230B, U+2310,
U+2319, U+231C-2321, U+2336-237A, U+237C, U+2395, U+239B-23B6, U+23D0,
U+23DC-23E1, U+2474-2475, U+25AF, U+25B3, U+25B7, U+25BD, U+25C1, U+25CA,
U+25CC, U+25FB, U+266D-266F, U+27C0-27FF, U+2900-2AFF, U+2B0E-2B11,
U+2B30-2B4C, U+2BFE, U+FF5B, U+FF5D, U+1D400-1D7FF, U+1EE00-1EEFF;
}
/* symbols */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B5OaVIGxA.woff2) format('woff2');
unicode-range: U+0001-000C, U+000E-001F, U+007F-009F, U+20DD-20E0, U+20E2-
20E4, U+2150-218F, U+2190, U+2192, U+2194-2199, U+21AF, U+21E6-21F0, U+21F3,
U+2218-2219, U+2299, U+22C4-22C6, U+2300-243F, U+2440-244A, U+2460-24FF,
U+25A0-27BF, U+2800-28FF, U+2921-2922, U+2981, U+29BF, U+29EB, U+2B00-2BFF,
U+4DC0-4DFF, U+FFF9-FFFB, U+10140-1018E, U+10190-1019C, U+101A0, U+101D0-
101FD, U+102E0-102FB, U+10E60-10E7E, U+1D2C0-1D2D3, U+1D2E0-1D37F,
U+1F000-1F0FF, U+1F100-1F1AD, U+1F1E6-1F1FF, U+1F30D-1F30F, U+1F315,
U+1F31C, U+1F31E, U+1F320-1F32C, U+1F336, U+1F378, U+1F37D, U+1F382,

53
U+1F393-1F39F, U+1F3A7-1F3A8, U+1F3AC-1F3AF, U+1F3C2, U+1F3C4-1F3C6,
U+1F3CA-1F3CE, U+1F3D4-1F3E0, U+1F3ED, U+1F3F1-1F3F3, U+1F3F5-1F3F7,
U+1F408, U+1F415, U+1F41F, U+1F426, U+1F43F, U+1F441-1F442, U+1F444,
U+1F446-1F449, U+1F44C-1F44E, U+1F453, U+1F46A, U+1F47D, U+1F4A3, U+1F4B0,
U+1F4B3, U+1F4B9, U+1F4BB, U+1F4BF, U+1F4C8-1F4CB, U+1F4D6, U+1F4DA,
U+1F4DF, U+1F4E3-1F4E6, U+1F4EA-1F4ED, U+1F4F7, U+1F4F9-1F4FB, U+1F4FD-
1F4FE, U+1F503, U+1F507-1F50B, U+1F50D, U+1F512-1F513, U+1F53E-1F54A,
U+1F54F-1F5FA, U+1F610, U+1F650-1F67F, U+1F687, U+1F68D, U+1F691, U+1F694,
U+1F698, U+1F6AD, U+1F6B2, U+1F6B9-1F6BA, U+1F6BC, U+1F6C6-1F6CF,
U+1F6D3-1F6D7, U+1F6E0-1F6EA, U+1F6F0-1F6F3, U+1F6F7-1F6FC, U+1F700-1F7FF,
U+1F800-1F80B, U+1F810-1F847, U+1F850-1F859, U+1F860-1F887, U+1F890-1F8AD,
U+1F8B0-1F8B1, U+1F900-1F90B, U+1F93B, U+1F946, U+1F984, U+1F996, U+1F9E9,
U+1FA00-1FA6F, U+1FA70-1FA7C, U+1FA80-1FA88, U+1FA90-1FABD, U+1FABF-
1FAC5, U+1FACE-1FADB, U+1FAE0-1FAE8, U+1FAF0-1FAF8, U+1FB00-1FBFF;
}
/* vietnamese */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4vaVIGxA.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-

54
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4uaVIGxA.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4gaVI.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122,
U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

6.3 SYSTEM TESTING


The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to check
the functionality of components, sub-assemblies, assemblies and/or a finished product It is
the process of exercising software with the intent of ensuring that the Software system meets
its requirements and user expectations and does not fail in an unacceptable manner. There
are various types of tests. Each test type addresses a specific testing requirement.

6.4 TYPES OF TESTING


6.4.1 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business

55
process, application, and/or system configuration. Unit tests ensure that each unique path of
a business process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.

6.4.2 INTEGRATION TESTING


Integration tests are designed to test integrated software components to determine if
they run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components.
6.4.3 FUNCTIONAL TESTING
Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions,
or special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.
6.4.4 SYSTEM TESTING
System testing ensures that the entire integrated software system meets requirements.
It tests a configuration to ensure known and predictable results. An example of system
testing is the configuration-oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration points.
6.4.5 WHITE BOX TESTING
White Box Testing is a testing in which in which the software tester has knowledge

56
of the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.
6.4.6 BLACK BOX TESTING
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a test in which
the software under test is treated as a black box. you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
6.4.7 ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
6.4.8 TESTING RESULTS
All the test cases mentioned above passed successfully. No defects encountered.

6.5 TESTING METHODOLOGIES


The following are the Testing Methodologies:
 Unit Testing.
 Integration Testing.
 User Acceptance Testing.
 Output Testing.

6.5.1 UNIT TESTING


Unit testing focuses verification effort on the smallest unit of Software design that is
the module. Unit testing exercises specific paths in a module’s control structure to ensure
complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.

During this testing, each module is tested individually, and the module interfaces
are verified for the consistency with design specification. All-important processing path are
tested for the expected results. All error handling paths are also tested.

57
6.5.2 INTEGRATION TESTING
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.

The following are the types of Integration Testing:

1) Top-Down Integration

This method is an incremental approach to the construction of program structure.


Modules are integrated by moving downward through the control hierarchy, beginning with
the main program module. The module subordinates to the main program module are
incorporated into the structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are replaced
when the test proceeds downwards.

2) Bottom-up Integration

This method begins the construction and testing with the modules at the lowest level
in the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs
is eliminated. The bottom-up integration strategy may be implemented with the following
steps:
 The low-level modules are combined into clusters into clusters that perform a specific
Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test case input and
output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the program.

Structure
The bottom-up approach tests each module individually and then each module is module is
integrated with a main module and tested for functionality.

58
6.5.3 USER ACCEPTANCE TESTING
User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.

6.5.4 OUTPUT TESTING


After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the
specified format. Asking the users about the format required by them tests the outputs
generated or displayed by the system under consideration. Hence the output format is
considered in 2 ways – one is on screen and another in printed format.

VALIDATION TESTING
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal
to its size. The text fields are alphanumeric in some tables and alphabetic in other
tables. Incorrect entry always flashes and error message.

Using Live Test Data:


Live test data are those that are extracted from organization files. After a system is
partially constructed, programmers or analysts often ask users to key in a set of data
from their normal activities. Then, the systems person uses this data to partially test
the system. In other instances, programmers or analysts extract a set of live data from
the files and have them entered themselves. It is difficult to obtain live data in
sufficient amounts to conduct extensive testing. And, although it is realistic data that
would show how the system would perform for the typical processing requirement, if
the live data entered is in fact typical, such data generally will not test all combinations
or formats that can enter the system.

59
Chapter 7
SCREEN SHOTS

Fig 7.1: Front end page

60
Fig 7.2:- Input url

Fig 7.3:- Output

61
Chapter 8
CONCLUSION

In this study, we presented a machine learning-based approach for the classification of


news articles as authentic or fake, aimed at combating the proliferation of misinformation in online
sources. Leveraging a Multinomial Naive Bayes classifier trained on a dataset of labelled news
articles, we developed a robust classification model capable of accurately discerning between
genuine and fabricated news content.
Through the integration of the model with a user-friendly front-end interface, users can
easily input the URL of a news article and receive immediate classification results, enhancing
accessibility and usability. The model achieved a high accuracy score on the test dataset,
demonstrating its effectiveness in distinguishing between authentic and fake news articles.
The deployment of such classification systems holds significant promise in addressing the
challenges posed by fake news, promoting the dissemination of accurate information, and fostering
trust in online news sources. By providing users with tools to verify the authenticity of news content,
we can empower individuals to make informed decisions and combat the spread of misinformation in
today's digital age.
Ongoing monitoring and evaluation of classification systems in real-world settings will be
crucial to ensure their effectiveness and reliability over time.
In conclusion, the development and deployment of machine learning-based classification systems
represent a significant step towards addressing the challenges of misinformation and promoting the
dissemination of accurate information in online news sources. Through interdisciplinary
collaboration and continued refinement of these systems, we can work towards a more informed and
trustworthy digital information ecosystem.

62
Chapter 9
FUTURE SCOPE
The future scope for fake news classification is vast and evolving, given the increasing
sophistication of misinformation and the technologies used to propagate it. Here are some
potential avenues for future development in this field:

Advanced Machine Learning Techniques: Continuing advancements in machine learning


algorithms, particularly in deep learning, reinforcement learning, and natural language
processing (NLP), can lead to more accurate and efficient fake news detection models.
These models can better analyze textual, visual, and audio content to identify patterns
indicative of misinformation.

Multimodal Analysis: Fake news often involves various forms of media, including text,
images, videos, and audio. Future research can focus on developing multimodal
classification techniques that integrate information from multiple modalities to enhance
accuracy and robustness.

Adversarial Defense Mechanisms: Adversarial attacks aimed at deceiving fake news


detection systems are a significant challenge. Future research can explore robust defenses
against such attacks, including adversarial training, feature obfuscation, and anomaly
detection techniques.

Domain-Specific Solutions: Fake news manifests differently across various domains, such
as politics, health, and finance. Tailoring classification models to specific domains can
improve their accuracy and effectiveness in detecting domain-specific misinformation.

Real-Time Detection: Developing real-time fake news detection systems capable of


analyzing and flagging potentially misleading content as it spreads across online platforms
is crucial. Such systems can help curb the rapid dissemination of misinformation and
mitigate its impact.

63
User-Centric Approaches: Empowering users with tools to identify and verify the
credibility of information they encounter online is essential. Future research can focus on
designing user-friendly browser extensions, plugins, or mobile apps that provide real-time
feedback on the trustworthiness of news articles and social media posts.

Collaborative Efforts: Combating fake news requires collaboration among researchers,


industry stakeholders, policymakers, and the public. Future initiatives should prioritize
interdisciplinary collaboration to develop comprehensive strategies for addressing
misinformation effectively.

Education and Media Literacy: Investing in education and media literacy initiatives is
fundamental for empowering individuals to critically evaluate information sources and
recognize misinformation. Future efforts should focus on integrating media literacy
training into school curricula and promoting digital literacy among the general public.

Overall, the future of fake news classification relies on a combination of technological


innovation, interdisciplinary collaboration, ethical considerations, and education initiatives
to combat misinformation effectively in the digital age.

64
REFRENCES

[1] Chowdhury, G. (2003) Natural language processing. Annual Review of Information


Science and Technology, 37. pp. 51-89. ISSN 0066-4200
[2] A. N. K. Movanita, "BIN: 60 Persen Konten Media Sosial adalah Informasi Hoaks (BIN:
60 percent of social media content ishoax)," 2018. [Online]. Available:
https://nasional.kompas.com/read/2018/03/15/0647555 1/bin-60- persen-konten-media-
socialadalah- informasi- hoaks.
[3] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, "Fake news detection
using deep learning models: A novel approach," Transactions on Emerging
Telecommunications Technologies, 2019.
[4] K.-H. Choi, "A study on the effect of reading side tool with NLP skill on student chinese
reading performance," Master Thesis, Grad. Ins. Edu. Inf. and Meas., National Taichung
Univ. of Edu., Taichung, Taiwan, 2015.
[5] P. Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, “Fake News Detection using
Machine Learning,” ITM Web of Conferences, vol. 40, p. 03003, 2021, doi:
10.1051/itmconf/20214003003.
[6] Raj Bridgelall Department of Transportation, Logistics, and Finance, College of Business,
North Dakota State University, Fargo, ND 58108, USA; [email protected]
[7] Marwan Omar, Soohyeon Choi, DaeHun Nyang, and David Mohaisen, 3 Jan 2022
[8] Fake news detector: NLP project by ishant juyal
(https://levelup.gitconnected.com/fake-news-detector-nlp-project-9d67e0177075)
[9]Aldwairi, M. and A. Alwahedi, Detecting Fake News in Social Media Networks.
[10] G. Agudelo, O. Parra, and J. Barón Velandia, “Raising a Model for Fake News
Detection Using Machine Learning in Python,” pp. 596–604, 2018, doi: 10.1007/978-3-030-
02131-3_52ï.
[11] . Y. Yang et al., “TI-CNN: Convolutional Neural Networks for Fake News Detection,”
Jun. 2018, Accessed: Mar. 24, 2023. [Online]. Available: https://arxiv.org/abs/1806.00749v3.
[12] J. Patel, M. Barreto, U. Sahakari, and Dr. S. Patil, “Fake News Detection with Machine
Learning,” International Journal of Innovative Technology and Exploring Engineering, vol.
10, no. 1, pp. 124– 127, Nov. 2020, doi: 10.35940/IJITEE.A8090.1110120
[13]H. Allcott, M. Gentzkow, and C. Yu, “Trends in the diffusion of misinformation on

65
social media,” Research and Politics, vol. 6, no. 2, Apr. 2019, doi:
10.1177/2053168019848554.
[14] S. M. Jones-Jang, T. Mortensen, and J. Liu, “Does Media Literacy Help Identifi cation
of Fake News? Information Literacy Helps, but Other Literacies Don’t,”American
Behavioral Scientist, vol. 65, no. 2, pp. 371–388, Feb. 2021, doi:
10.1177/0002764219869406.
[15] P. Machete and M. Turpin, “The Use of Critical Thinking to Identify Fake News: A
Systematic Literature Review,” Lecture Notes in Computer Science (including subseries
Lecture Notes in Artifi cial Intelligence and Lecture Notes in Bioinformatics), vol. 12067
LNCS, pp. 235–246, 2020, doi: 10.1007/978-3-030-45002-1_20.
[16]P. Goyal, S. Taterh, and A. Saxena, “Fake News Detection using Machine Learning: A
Review,” International Journal of Advanced Engineering, Management and Science
(IJAEMS), vol. 7, no. 3, pp. 2454–1311, 2021, doi: 10.22161/ijaems.
[17] X. Zhou, R. Zafarani, K. Shu, and H. Liu, “Fake News: Fundamental theories, detection
strategies and challenges,” WSDM 2019 - Proceedings of the 12th ACM International
Conference on Web Search and Data Mining, pp. 836–837, Jan. 2019, doi:
10.1145/3289600.3291382.
[18]. S. Hakak, W. Z. Khan, S. Bhattacharya, G. T. Reddy, and K. K. R. Choo, “Propagation
of Fake News on Social Media: Challenges and Opportunities,” Lecture Notes in Computer
Science (including subseries Lecture Notes in Artifi cial Intelligence and Lecture Notes in
Bioinformatics), vol. 12575 LNCS, pp. 345–353, 2020, doi: 10.1007/978-3-030-66046-8_28.
[19] H. Allcott and M. Gentzkow, “Social Media and Fake News in the 2016 Election,”
Journal of Economic
Perspectives, vol. 31, no. 2, pp. 211–36, Mar. 2017, doi: 10.1257/JEP.31.2.211.
[20] P. Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, “Fake News Detection using
Machine Learning,” ITM Web of Conferences, vol. 40, p. 03003, 2021, doi:
10.1051/itmconf/20214003003.

66
PAPER PUBLICATION REPORT

CERTIFICATES OF AUTHORS
PUBLISHED PAPER DOCUMENT
International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

Fake News Classification Using NLP


Ms.G .Hemasudharani1, G.Nikhitha2, N S Praneetha Gandikota3,
S. Sri Divijendra Kumar4,Md.Abdul Naveed5
1Assistant Professor, Department of CSE-Artificial Intelligence and Machine Learning , S.R.K Institute ofTechnology,
NTR, Andhra Pradesh, India.
2
student, Department of CSE-Artificial Intelligence and Machine Learning, S.R.K Institute ofTechnology,
NTR, Andhra Pradesh, India
3
student, Department of CSE- CSE-Artificial Intelligence and Machine Learning, S.R.K Institute of
Technology, NTR, Andhra Pradesh, India
4student, Department of CSE- Artificial Intelligence and Machine Learning, S.R.K Institute ofTechnology,

NTR, Andhra Pradesh, India


5
student, Department of CSE- Artificial Intelligence and Machine Learning, S.R.K Institute ofTechnology,
NTR, Andhra Pradesh, India

Abstract— In today’s modern world, "fake news" has been a major concern, spreading like
wildfire through many platforms. This phenomenon not only undermines the credibility of
information but also misleads society. Nowadays, social media is the greatest means by which
fake news spreads all over the place. This can cause many problems such as defamation of
people and spreading news in favour of specific individuals. Fake news often targets the most
prominent, powerful, and influential people in society, aiming to tarnish their reputation. The
escalating impact of fake news knows no bounds. Fake news is often biased, favouring a
single person or a section of people in society for their personal benefits. To mitigate these
challenges and promote transparency, there is a need to reduce the spread of fake news.
Introducing a "Fake News Classifier using NLP" offers a promising solution to combat this
issue. By using machine learning algorithms, this classifier can effectively identify misleading
information as fake news, thereby contributing to awareness in society and reducing losses.

Keywords— Natural Processing Language, TF-IDF, Flask, Classification, MultinomialNB,


Accuracy.

1. INTRODUCTION
Fake news primarily consists of mis leading information spread across the society,
creating turmoil. In this era, Information is all over and the number of people accessing the
information is increasing substantially. There should be awareness among users regarding
what type of information they are consuming - “is it real? or fake?”. Moreover, most of the
social media platforms allow users to share their views through stories, statuses, posts,
directly affecting the spread of news, which may often considered fake. One very famous
Social media platform, what’s App serves as a means for consistently sharing fake news
among its users through What’s App groups, Statuses, personal messages. If this sharing or
spreading of fake news reaches a significant number, there is a risk of people believing it,
leading to disorder.
One such recent example is the rumour of the ban on 10 rupees coin in India. There
was widespread news that 10 rupees coins in India were banned, thanks to social media,
which facilitated the rapid spread of this misinformation. Nobody was accepting 10 rupees
coins, causing concern among people in India about what to do with them. However, the
government did not announce any such ban on 10 rupees coin, it was simply a baseless
rumour. After confirmation from the Reserve Bank of India (RBI), people calmed down, and
acceptance of the 10 rupees coins resumed.
2.EXISTING SYSTEM
There are various models which exist for Real &Fake news Detection. The most

ISSN No: 2250-3676 www.ijesat.com Page | 58


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

prevalent system consists of a model that detects fake news based on keywords as well as the
headlines, simultaneously.Passive Aggressive detects fake news using keyword analysis and
headline,addressing topic-specific tendencies and author behavior and it contains the
sentiment analysis.

Fig 1 : Fake news detector(Existing)


Disadvantages:
 Limited Contextual Understanding
 It has limitations in Sentiment Analysis
 The concept can be manipulated easy

3. LITERATURE SURVEY:

Yang et al.,[11] TI-CNN model is used for identifying fake news in social media,
which performed several methods with accuracy of 92.20%.The dataset collected before the
election was held in 2016 US presidential elections.
Patel et al., [12] introduce a Natural Language Processing technique with different
classifiers to detect whether the news is real or fake .Algorithms like SVM and KNN gave
results with an accuracy of 88.47% and 86.90% while K-means gave results with low
accuracy 40.37%.
Kulkarni et al., [5] Their work on the classifiers like Random Forest, Logistic
Regression, Decision Tree and Gradient Boosting Algorithms. Logistic Regression
accomplish the highest accuracy of 85.04%, followed by Random Forest with 84.50%
accuracy and Decision Tree achieve 80.20% ,while Gradient Boosting algorithm accomplish
the lowest accuracy of 77.44%.
Agudelo et al.,[10] Detecting False news using machine learning algorithms, natural
language processing, and Python programming. By using algorithms like Multinomial Naive
Bayes model,CountVectorizer and TF-IDF Vectorizer algorithms, we accomplish the high
accuracy of 88.1% and 84.8% on dataset consisting of over 10,000 news items.

4. PROPOSED SYSTEM
In this paper we are going to make use of Natural Language Techniques to overcome the
widespread of false news on the internet. Here we make use techniques to determine how the
Multinomial algorithm works on the given clip of information which can be given as input to
the system.
The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.

4.1 : METHODOLOGY:
We choose the MultinomialNB Classifier because, it performs satisfactory with data
sets with high dimensionality and it’s mainly particular classifier when comes to the text

ISSN No: 2250-3676 www.ijesat.com Page | 59


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

classification.Multinomial Naive Bayes assumes a feature vector where each element


represents the number of times it appears (or, very often, its frequency).

The dataset we’ll use for this project- we’ll call it news.csv. This dataset has a shape of
7796×4. The first column identifies the news, the second and third are the title and text, and
the fourth column has labels denoting whether the news is REAL or FAKE.

Fig 2 : Dataset Image


The Proposed system is a application means to classify whether the article is fake or
real using Natural Language Processing techniques and machine learning.
We create a user-friendly web interface and there the users can give input url of news
article to check whether it is fake or real. When comes to backend system built with using
flask, and it is a python web framework. Flask web server handles requests from interface and
then it processes them and returns the classification results to the user.
When it comes to NLP modules we use mainly newspaper3k and pre-processing.
Firstly, we discuss the concept called Newspaper3k .It is used for web scraping and extracts
the news article content from provided URL. Secondly, we discuss the concept called Pre-
processing .When the newspaper3k extract the content and gives to the pre-processing and it
undergoes the steps like tokenization, stopword removal, and possibly stemming or
lemmatization to prepare it for analysis.
After, prep-processing the text data is converted into numerical features that can be
understood by the MultinomialNB model. Some techniques like word embedding or TF-IDF
may be employed for this purpose. The pre-processed features are boarded into a machine
learning model trained on a labeled dataset of articles. There several machine learning models
are present for classification but we choose the MultinomialNB classifier and the model
predicts whether the article is fake or real based on the extracted features.
Overall, the system contains NLP techniques and machine learning models to
automatically classify news articles, thereby users identify the misleading or false
information.

ISSN No: 2250-3676 www.ijesat.com Page | 60


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

Fig 3: Fake news classification using nlp architecture

4.2 : MULTINOMIAL NAIVE BAYES ALGORITHM:


Multinomial Naive Bayes is a machine learning algorithm that is based on bayes’
theroem.It is a probilistic classifier to calculate the probability distribution of given data
which in the form of text,which makes it suited for data and the features are represent discrete
frequencies or count of events in various natural language processing(NLP) tasks.The
probability mass function (PMF) of the Multinomial distribution is used to model the
likelihood of observing a specific set of word counts in a document. It is given by:

4.3 : IMPLEMENTATION
- Loading Data:The cleaned dataset news.csv containing text and label columnsand the shape
of the data set is 7796×4 and it contains attributes like title,text and label.
- Splitting Data:The dataset is splits in the ratio of 8:2 that means 80% of the data is training
purpose and 20% of the data is testing purpose.
- Creating Pipelene:We create a machine learning pipeleine that applis TF-IDF vectorization
to the text data to convert it into numerical features and then applies the Multinomial Naive
Bayes Classifier.
- Training the Model:Train the pipeline on training data(X_train and y_train).
- Predicting Labels:Using trained model we predict the labels for test data.
-Model Evaluation:Evaluate the performance using confusion matrix and classification report.
- Deployment:Deployment were held in the user interface using environment which allows
users to identify where the news is real or fake.

ISSN No: 2250-3676 www.ijesat.com Page | 61


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

Fig 4:- Implementation

5. RESULTS AND DISCUSSION:


In this project we predict whether the news using the NLP module and we use the fake
and true news datasets for creation of this system.We used to performance Text Pre-
processing and Vectorization to detect the news.we obtained the accuracy of 85.0%,hence we
declared the news is real or fake.

6.SAMPLE SCREENSHOTS:
In this below screenshot we see the front end page and we just fed the news url link then
it gives the news whether the news is fake or real.

Fig 5:- Front-end page without article link

In the below Screenshot we see that the link provided and it goes to NLP module and then the
newspaper3k is extracts the content and we apply the content to the machine learning model
and it goes to module by using flask.

ISSN No: 2250-3676 www.ijesat.com Page | 62


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

Fig 6:- Front-end page with article link


In the below Screenshot we see that the news is classify by using the newspaper3k andit
extract the content and apply preprocessing,machine learning models and b using
flaskwebserver we get the prediction and it tells whether the article news is real or fake.

Fig 7:- we get whether news is real or fake

7. CONCLUSION:
Fake news responsible for creating false and misleading information that greatly affect the
people and the event. This project explains what fake news are and what real news are by
using Natural Language Processing and Machine Learning model which is used for
classification. We use NLP for automatically predict and detect the news whether it is real or
fake news. In This project develop a web application for fake news classification using

ISSN No: 2250-3676 www.ijesat.com Page | 63


International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024

Natural Language Processing techniques. We use flask for back-end purpose and we allows
the users to give input news article URLs. The application get content by using newspaper3k,
process it using a pre-trained ML model, and returns a classification result. The front -end
displays the result.
Overall, the project aims to oppose misinformation by providing the tool to identify whether
the news is real or fake potentially.

8. FUTURE SCOPE:
In future of fake news classification using nlp ,several chances for exploration and
enhancement. This could involve experimenting with different machine learning algorithms
or we can say that nlp techniques like feature engineering to improve model accuracy and
efficiency. There is chance to explore different sources, languages and types of news to make
your model more robust and adaptable to various contexts. Focusing on these points, they can
continue to evolve and making it more effective, reliable and valuable to users.

9.REFERENCES:
[1] Chowdhury, G. (2003) Natural language processing. Annual Review of Information Science and
Technology, 37. pp. 51-89. ISSN 0066-4200
[2] A. N. K. Movanita, "BIN: 60 Persen Konten Media Sosial adalah Informasi Hoaks (BIN: 60
percent of social media content ishoax)," 2018. [Online]. Available:
https://nasional.kompas.com/read/2018/03/15/0647555 1/bin-60- persen-konten-media- socialadalah-
informasi- hoaks.
[3] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, "Fake news detection using deep
learning models: A novel approach," Transactions on Emerging Telecommunications Technologies,
2019.
[4] K.-H. Choi, "A study on the effect of reading side tool with NLP skill on student chinese reading
performance," Master Thesis, Grad. Ins. Edu. Inf. and Meas., National Taichung Univ. of Edu.,
Taichung, Taiwan, 2015.
[5] P. Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, “Fake News Detection using Machine
Learning,” ITM Web of Conferences, vol. 40, p. 03003, 2021, doi: 10.1051/itmconf/20214003003.
[6] Raj Bridgelall Department of Transportation, Logistics, and Finance, College of Business, North
Dakota State University, Fargo, ND 58108, USA; [email protected]
[7] Marwan Omar, Soohyeon Choi, DaeHun Nyang, and David Mohaisen, 3 Jan 2022
[8] Fake news detector: NLP project by ishant juyal
(https://levelup.gitconnected.com/fake-news-detector-nlp-project-9d67e0177075)
[9]Aldwairi, M. and A. Alwahedi, Detecting Fake News in Social Media Networks.
[10] G. Agudelo, O. Parra, and J. Barón Velandia, “Raising a Model for Fake News Detection Using
Machine Learning in Python,” pp. 596–604, 2018, doi: 10.1007/978-3-030-02131-3_52ï.
[11] . Y. Yang et al., “TI-CNN: Convolutional Neural Networks for Fake News Detection,” Jun. 2018,
Accessed: Mar. 24, 2023. [Online]. Available: https://arxiv.org/abs/1806.00749v3.
[12] J. Patel, M. Barreto, U. Sahakari, and Dr. S. Patil, “Fake News Detection with Machine Learning,”
International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 1, pp. 124–
127, Nov. 2020, doi: 10.35940/IJITEE.A8090.1110120
[13]H. Allcott, M. Gentzkow, and C. Yu, “Trends in the diffusion of misinformation on social media,”
Research and Politics, vol. 6, no. 2, Apr. 2019, doi: 10.1177/2053168019848554.
[14] S. M. Jones-Jang, T. Mortensen, and J. Liu, “Does Media Literacy Help Identifi cation of Fake
News? Information Literacy Helps, but Other Literacies Don’t,”American Behavioral Scientist, vol.
65, no. 2, pp. 371–388, Feb. 2021, doi: 10.1177/0002764219869406.
[15] P. Machete and M. Turpin, “The Use of Critical Thinking to Identify Fake News: A Systematic
Literature Review,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artifi
cial Intelligence and Lecture Notes in Bioinformatics), vol. 12067 LNCS, pp. 235–246, 2020, doi:
10.1007/978-3-030-45002-1_20.

ISSN No: 2250-3676 www.ijesat.com Page | 64

You might also like