Team6 Final FNC
Team6 Final FNC
Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, KAKINADA
For Partial Fulfilment of Award of the Degree of
BACHELOR OF TECHNOLOGY
Submitted By
G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)
APRIL 2024
S.R.K INSTITUTE OF TECHNOLOGY
(Approved By AICTE, New Delhi & Affiliated To JNTU, Kakinada)
(An Iso 9001:2015 Certified Institution & Accredited by NAAC With "A" Grade)
Enikepadu, Vijayawada-521108.
DEPARTMENT OF CSE-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
CERTIFICATE
(Ms.G.Hemasudharani) (Dr.A.Radhika)
Signature of the Guide Signature of the HOD
G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)
ACKNOWLEDGEMENT
Firstly, we would like to convey our heart full thanks to the Almighty for the blessings on us to
carry out this project work without any disruption.
We are also thankful for our project coordinator Dr.D.Anusha, for her valuable guidance
which helped us to bring this project successfully.
We are greatly thankful to our principal Dr. M. Ekambaram Naidu for his kind support
and facilities provided at our campus which helped us to bring out this project successfully.
Finally, we would like to convey our heart full thanks to our Technical Staff, for their
guidance and support in every step of this project. We convey our sincere thanks to all the faculty
and friends who directly or indirectly helped us with the successful completion of this project.
G.Nikhitha (20X41A4217)
N S .Praneetha Gandikota (20X41A4237)
S.Sri Divijendra Kumar (20X41A4248)
Md.Abdul Naveed (20X41A4235)
CONTENTS
TITLE PAGE. NO
ABSTRACT
List of Figures
Chapter 1: INTRODUCTION 1
1.1 : Overview 1
1.2 : About the Project 1
1.3 : Purpose 1
1.4 : Scope 1
Chapter 7: SCREENSHOTS 60
Chapter 8: CONCLUSION 62
Chapter 9: Future Scope 63
REFERENCES 65
ii.
List of Figures
iii.
ABSTRACT
In today’s modern world, "fake news" has been a major concern, spreading like wildfire
through many platforms. This phenomenon not only undermines the credibility of information but
also misleads society. Nowadays, social media is the greatest means by which fake news spreads all
over the place. This can cause many problems such as defamation of people and spreading news in
favour of specific individuals. Fake news often targets the most prominent, powerful, and influential
people in society, aiming to tarnish their reputation. The escalating impact of fake news knows no
bounds. Fake news is often biased, favouring a single person or a section of people in society for
their personal benefits. To mitigate these challenges and promote transparency, there is a need to
reduce the spread of fake news. Introducing a "Fake News Classifier using NLP" offers a promising
solution to combat this issue. By using machine learning algorithms, this classifier can effectively
identify misleading information as fake news, thereby contributing to awareness in society and
reducing losses.
iv.
Chapter 1
INTRODUCTION
1.1 Overview
This project starts with collecting labeled news data and proceeds to process the text,
extract features, and train a classifier. It generates a frontend interface where users can input news
links and receive classification results. This adaptation maintains the system's flow while
implementing NLP-based fake news detection.
1.3 Purpose:
The main purpose of this application is to determine whether news articles are fake or real by
using unique links.
1.4 Scope:
The scope for this project lies in addressing the growing concern of misinformation by
providing a reliable tool for fake news detection. With the increasing reliance on digital information,
there is a significant demand for robust systems that can accurately classify news articles and combat
the spread of false information.
1
Chapter 2
LITERATURE REVIEW
2
exacerbated by the advent of social media, making it easier for false information to spread
rapidly. The consequences of fake news are profound, impacting democratic processes,
public health, and societal well-being. Addressing this challenge has become imperative,
leading researchers to explore various approaches, including fact-checking, media literacy
programs, and machine learning techniques. Among these, machine learning algorithms
show promise in detecting fake news by analyzing large datasets of news articles. In this
context, the study focuses on utilizing Logistic Regression fused with Natural Language
Processing techniques for fake news detection, achieving high accuracy rates. Additionally,
the research reviews notable contributions in the field, highlighting advancements in fake
news detection methodologies and models, such as Multichannel Deep Neural Networks and
BERT models. Methodologically, the study involves data cleaning, feature extraction using
TF-IDF vectorization, and training of machine learning models. The developed model is
deployed through a web application, enabling users to classify news content as real or fake.
Overall, the research contributes to the ongoing efforts to combat fake news proliferation
through innovative technological solutions and empirical evaluations.
In response to the challenge of fake news detection, the paper presents a model
leveraging n-gram analysis and machine learning classification techniques. It outlines the
data preprocessing steps, including the removal of noise such as punctuation and stop words,
to prepare the dataset for analysis. Feature extraction methods like Term Frequency-Inverted
Document Frequency (TF-IDF) are employed to convert textual data into numerical
3
representations, facilitating machine learning algorithms' training and evaluation.The
experimental section of the paper details the model's performance using various classifiers
and feature extraction techniques. Results demonstrate the effectiveness of the proposed
approach, with the LSVM classifier achieving the highest accuracy of 92%. The study
underscores the importance of feature selection and classifier optimization in enhancing
detection accuracy.
4
upon. We have analysed quite a few papers that had done work upon fake news detection.
Many types of model were trained which had many issues and had obtained many results
which provided a lot of help in our project. The researchers have applied a lot of algorithms
ranging from linear regression to deep learning algorithms. All the papers have first argued
about how fake news has been troubling the world since a long time which has resulted in a
lot of chaos including death in many cases. They have talked about the importance of
classification of such news and how it becomes important to remove such propaganda to
prevent treating misinformation as news.
The use of deep learning algorithms like CNN have been shown and the
final accuracies have been shown where importance to data classification has been given.
Now coming to the research papers, we can observe most of them have picked the dataset
from LIAR dataset. Some other datasets are also included for example combined corpus by
Junaed Younus Khan , Md. Tawkat Islam Khondaker , Anindya Iqbal and Sadia Afroz.
There has been a proper classification that has been done for the type of data they are getting
for example visual based and user based. This has been discussed in detail by Syed Ishfaq
Manzoor, Dr Jimmy Singla and Dr Nikita in their research paper. For data cleaning different
methods have been employed to remove all the unnecessary IP and the URL addresses.
Whitespaces have been removed using stemming. TFI-IDF has been used extensively for the
vectorization techniques by most of the papers. The above two works have been done by
Junaed Younus Khan. BOW has been used in the research paper by Dr Singla. Another
important point about data pointed out in the research papers was the issue of bias in data
aligning with the models. Next all the 3 research papers have done the feature extraction
where empath tool has been used for classifying the type of news as violent, misleading etc.
Another important method used here is Lexical and Sentiment Feature extraction has been
done where word count, word length has been used as lexical while positive and negative has
5
been marked as lexical. This works also has been done by the research paper made by
Junaed Younus Khan. Next traditional models have been used such as SVM, Linear
Regression, Decision Trees, Naïve Bayes and K-NN model by professors at Dhaka
University. XG Boost and Random Forrest were the new algorithms which were
implemented by professors at LPU. The paper made by Harsh Khatter argued about SVM
being used to solve the problem and proposed a model combining of News Aggregator,
Authenticator and Suggestion recommendation. Further deep learning algorithms have been
implemented for the better learning of the data so that better accuracies are obtained.
The paper by Dr Khatter implemented simple neural Networks for the same while the
paper by Anindya Iqbal discussed about the CNN model and used several new deep learning
algorithms like Hierarchical Attention Networks(HAN) and Convolutional HAN. Three
types of LSTM were also used which includes LSTM,C-LSTM and Bi-LSTM. LSTM is
basically Linguistic Inquiry and Word Count (LIWC) dictionary which includes a word
classification and count tool. The results were divided into two parts by professors at Dhaka
University were one analysed before the neural networks while the other talked about after
that. The best accuracy was reported by Naïve Bayes with 94 percent after using n-gram
(bigram TF-IDF) features. For the paper by Harsh Khatter it reported Naïve Bayes to be the
best with a accuracy of 93.5 percent and the paper by professors at LPU argued about XG
Boost being the best. In conclusion all papers argued that perfect accuracy cannot be
obtained and scope of future work was there.
6
Evaluation of the classifiers was conducted using key metrics including accuracy,
precision, recall, and F1 score. Notably, the logistic regression model emerged as the most
effective classifier, showcasing superior performance compared to the alternatives. This
highlights the significance of logistic regression in distinguishing between real and fake
news articles, thereby underscoring its potential utility in combating misinformation.
The study's findings contribute to the ongoing efforts to address the challenges
posed by fake news dissemination, emphasizing the pivotal role of machine learning in
information verification. By demonstrating the efficacy of machine learning techniques in
fake news detection, the research underscores the importance of continued exploration and
development in this domain. Ultimately, the study underscores the potential of machine
learning algorithms to serve as valuable tools in promoting media literacy and combating
the spread of misinformation in the digital age.
To address these challenges, Yang et al. introduce the TI-CNN framework, which
stands for Textual Information-based Convolutional Neural Networks. This framework
leverages both textual information and external knowledge to improve the accuracy of fake
news detection. The CNN model is trained on a dataset of news articles labeled as either
fake or genuine, allowing it to learn patterns and features indicative of fake news.One key
aspect of the TI-CNN framework is its utilization of external knowledge sources, such as
knowledge graphs and semantic embeddings, to enhance the model's understanding of the
textual information. By incorporating external knowledge, the model can capture deeper
7
semantic relationships between words and phrases, thus improving its ability to discriminate
between fake and genuine news articles.
8
The authors address the burgeoning issue of fake news dissemination, recognizing
the critical need for robust detection mechanisms amid the digital information age.
Through meticulous experimentation and analysis, they propose a novel approach
harnessing the power of deep learning models to discern between genuine and fabricated
news articles.Central to their methodology is the utilization of advanced deep learning
architectures, adept at extracting intricate patterns and features from textual data. By
training these models on large datasets comprising both authentic and deceptive articles,
they aim to equip the system with the discriminative prowess necessary for accurate
classification.
The paper delves into the intricacies of model architecture, training procedures,
and performance evaluation metrics employed to gauge the efficacy of the proposed
approach. Results from empirical studies demonstrate promising outcomes, showcasing
the potential of deep learning in bolstering the fight against fake news
dissemination.Moreover, the authors underscore the significance of their findings in real-
world applications, advocating for the integration of their methodology into existing news
verification frameworks to enhance credibility and trustworthiness in digital media
landscapes. This paper represents a significant contribution to the burgeoning field of fake
news detection, offering a robust framework underpinned by deep learning methodologies
to mitigate the adverse effects of misinformation in the digital age.
9
information. The paper not only explores the intricacies of model selection and training but
also delves into feature engineering and performance evaluation metrics employed to assess
the robustness of the proposed approach. Results from empirical studies showcase promising
outcomes, underscoring the potential of machine learning in fortifying the defenses against
fake news propagation. Furthermore, the authors advocate for the integration of their
methodology into mainstream news verification frameworks, emphasizing the critical role of
machine learning in fostering information integrity and trustworthiness in digital media
landscapes. Overall, this paper represents a significant stride in the ongoing battle against
misinformation, offering a valuable framework rooted in machine learning principles to
safeguard the veracity of online information.
10
Chapter 3
SYSTEM ANALYSIS
There are various models which exist for Real &Fake news Detection. The most
prevalent system consists of a model that detects fake news based on keywords as well as the
headlines, simultaneously.Passive Aggressive detects fake news using keyword analysis and
headline,addressing topic-specific tendencies and author behavior and it contains the
sentiment analysis.
11
overcome the widespread of false news on the internet.In this project we make use techniques
to determine how the MultinomialNB algorithm works on the given clip of information which
can be given as input to the system .
The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.The
basis for the project is to develop an classifier using article links and article context. This helps
admin to get information about news article .
Scalability: With proper implementation, the system can process large volumes
of news articles efficiently, making it suitable for real-time monitoring of online
news sources.
User-Friendly Interface: Providing users with a platform to input news clips and
receive reliability assessments enhances transparency and usability, fostering
trust in the system's output.
3.2.2 METHODOLOGY
In this paper we are going to make use of Natural Language Techniques to overcome the
widespread of false news on the internet. Here we make use techniques to determine how the
Multinomial algorithm works on the given clip of information which can be given as input to
12
the system.The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.
The system architecture is a application means to classify whether the article is fake
or real using Natural Language Processing techniques and machine learning.We create a user-
friendly web interface and there the users can give input url of news article to check whether it
is fake or real. When comes to backend system built with using flask, and it is a python web
framework. Flask web server handles requests from interface and then it processes them and
returns the classification results to the user.
After, prep-processing the text data is converted into numerical features that can be
understood by the MultinomialNB model. Some techniques like word embedding or TF-IDF
may be employed for this purpose. The pre-processed features are boarded into a machine
learning model trained on a labeled dataset of articles. There several machine learning models
are present for classification but we choose the MultinomialNB classifier and the model
predicts whether the article is fake or real based on the extracted features. The system contains
NLP techniques and machine learning models to automatically classify news articles, thereby
users identify the misleading or false information.The database stores structured data required
13
by the system, such as user information, news articles, prediction results, and system logs.
3.2.4 ALGORITHMS
14
tasks.The probability mass function (PMF) of the Multinomial distribution is used to
model the likelihood of observing a specific set of word counts in a document. It is given
by:
Text Preprocessing:
- Before training the classifier, the text data undergoes preprocessing steps such as
tokenization, stopword removal, and lowercasing.
- Tokenization involves breaking down the text into individual words or tokens.
- Stopword removal eliminates common words that do not carry significant meaning (e.g.,
"the", "is", "and").
Feature Representation:
- After preprocessing, the text data is converted into numerical features that can be
understood by the machine learning algorithm.
- One common approach is to use techniques like TF-IDF (Term Frequency-Inverse
Document Frequency) or count vectorization to represent the frequency of each word or n-
gram in the document.
- TF-IDF assigns weights to words based on their frequency in the document and their
rarity across all documents in the corpus.
- Vectorization is a technique used to converting input data from its raw format (i.e. text )
into vectors of real numbers . TF-IDF or Term Frequency–Inverse Document Frequency,
may be a numerical statistic that’s intended to reflect
how important a word is to a document. Although it’s another frequency-based method.
-TF stands for Term Frequency. It will be understood as a normalized frequency score. it's
calculated via the
subsequent formula:
15
So one can imagine that this number will always stay ≤ 1, thus we now judge how frequent a
word is in the context of all of the words in a document.
- IDF stands for Inverse Document Frequency, but before we go into IDF, we must
make sense of DF – Document Frequency. It’s given by the following formula:
DF tells us about the proportion of documents that contain a certain word. So what’s IDF?
It’s the reciprocal of the Document Frequency, and the final IDF score comes out of the
following formula:
Just as we discussed above, the intuition behind it's that the more common a word is across
all documents, the
lesser its importance is for this document.
A logarithm is taken to dampen the effect of IDF within the final calculation.
The final TF-IDF score comes dead set be:
16
Vocabulary Building:
- A vocabulary is built based on the unique words or n-grams present in the training data.
- Each word or n-gram becomes a feature in the feature vector, and its index in the vector
corresponds to its position in the vocabulary.
Model Training:
- The preprocessed and vectorized text data is used to train the MultinomialNB classifier.
- During training, the classifier learns the probability distribution of each feature (word or
n-gram) given the class label (fake or real) using maximum likelihood estimation.The
classifier calculates the probabilities of each word or n-gram occurring in a document given
its class.
Classification:
- To classify a new news article, the same preprocessing steps are applied to the article's
text.The article's text is then converted into a feature vector using the same vocabulary built
during training.
- The MultinomialNB classifier calculates the probability of the article belonging to each
class (fake or real) based on the observed features.The class with the highest probability is
predicted as the final classification for the article.
NLP techniques are integral to the process of feature extraction and representation in fake
news classification using MultinomialNB. By processing and vectorizing the text data
appropriately, NLP enables the classifier to effectively learn patterns and make accurate
predictions about the authenticity of news articles.
NLP Libraries
1. Natural Language Toolkit(NLTK)
NLTK is a vital library supports tasks like classification, stemming, tagging, parsing,
semantic reasoning, and tokenization in Python. It's basically our main tool for language
processing and machine learning. Today it is an academic foundation for Python developers
who are dipping their toes during this field (and machine learning).
The library was developed by Steven Bird and Edward Loper at the University of
Pennsylvania and played a key role in breakthrough NLP research. Many universities around
the globe now use NLTK, Python libraries, and other tools in their courses. This library is
17
pretty versatile, but we must admit that it’s also quite difficult to use for language Processing
with Python.
NLTK is rather slow and doesn’t match the strain of quick-paced production usage.
the educational curve is steep, but developers can profit of resources like this beneficial book
to be told more about the concepts behind the language processing tasks this toolkit supports.
2. SpaCy
SpaCy may be a relatively young library was designed for production usage. That’s
why it’s most more accessible than other Python NLP libraries like NLTK. SpaCy offers the
fastest syntactic parser available on the market today. Moreover, since the toolkit is written
in Python, it’s also really speedy and efficient.
However, no tool is ideal. compared to the libraries we covered up to now, spaCy
supports the tiniest number of languages (seven). However, the growing popularity of
machine learning, NLP, and spaCy as a key library implies that the tool might start
supporting more programming languages soon.
3. Scikit-learn
This handy NLP library provides developers with a good range of algorithms for
building machine learning models. It offers many functions for using the bag-of-words
method of making features to tackle text classification problems. The strength of this library
is that the intuitive classes methods. Also, scikit-learn has a wonderful documentation that
helps developers make the foremost of its features.
However, the library doesn't use neural networks for text pre-processing. So if you
would like to hold out more complex pre-processing tasks like POS tagging for your text
corpora, it's better to use other NLP libraries so return to scikit-learn for building your
models.
3.2.4.2 : FLASK:-
In the context of the fake news classification system described earlier, Flask is used
as the web framework to build the backend of the application. Here's how Flask is related to
the system:
Web Interface:
- Flask provides the infrastructure for creating a user-friendly web interface where users
can interact with the fake news classification system.
- Users input the URL of a news article through the web interface, and Flask handles the
HTTP request.
18
Routing:
- Flask defines routes to handle different URLs and HTTP methods. For example, the `'/'`
route renders the main HTML template, while the `'/predict'` route processes the URL input
and makes predictions.
- Routes are defined using decorators like `@app.route('/')` and `@app.route('/predict')`.
Request Handling:
- Flask's request object (`request`) is used to access data submitted in the HTTP request. In
this case, the URL of the news article is extracted from the request data using
`request.get_data(as_text=True)`.
Template Rendering:
- Flask integrates with Jinja2 templating engine to render HTML templates dynamically.
Templates are used to generate the web pages that users interact with.
- The `render_template` function is used to render HTML templates and pass data to them.
Integration with NLP Module and Machine Learning Model:
- Flask integrates with the NLP module and machine learning model responsible for
classifying news articles.
- When a URL is submitted through the web interface, Flask invokes the NLP module to
extract the news content from the URL, preprocess it, and pass it to the machine learning
model for classification.
Response Handling:
- Flask handles the classification result returned by the machine learning model and sends
it back to the web interface for display.
- The classification result is typically rendered within the HTML template using Jinja2
templating syntax.
Flask acts as the backbone of the fake news classification system, providing the
infrastructure for handling HTTP requests, routing, template rendering, and integrating with
the NLP module and machine learning model. It enables the creation of a user-friendly web
interface through which users can input news articles and receive classification results in
real-time.
3.2.5 Datasets:
19
FIG 3.4:- Training and Testing dataset
20
3.2.6 Modules
Data is obtained from a CSV file ('news.csv') containing text and corresponding labels
indicating whether each article is authentic or fake.The pandas library is used to load the
dataset into a DataFrame.
Text data and labels are extracted from the DataFrame and stored in separate variables (X and y,
respectively).
A.)Data Splitting:
The dataset is split into training and testing sets using the train_test_split function from scikit-
learn .80% of the data is used for training, and the remaining 20% is allocated for testing.
B.)Feature Engineering:
Text data is transformed into numerical feature vectors using the TF-IDF (Term Frequency-
Inverse Document Frequency) vectorization technique.Stop words (common words with little
semantic value) are removed during vectorization to improve model performance.
C.)Model Selection and Training:
A pipeline is created using scikit-learn's Pipeline module, which sequentially applies TF-IDF
vectorization and the Multinomial Naive Bayes classifier.The Multinomial Naive Bayes
algorithm is chosen due to its effectiveness in text classification tasks and its suitability for
handling sparse data.The pipeline is trained on the training data using the fit method.
D.)Model Evaluation:
The trained model is used to make predictions on the test data.Classification performance is
evaluated using standard metrics such as accuracy, precision, recall, and F1-score.The scikit-
learn classification_report function is employed to generate a comprehensive report of these
metrics.Confusion matrices are generated using the confusion_matrix function to visualize the
distribution of true positive, true negative, false positive, and false negative predictions.
E.)Model Serialization:
The trained model is serialized using the pickle module and saved to a file
('model.pickle').Serialization allows for the model to be easily stored and reloaded for future use
without needing to retrain it.
F.)Performance Assessment:
The accuracy of the model is calculated by comparing the predicted labels with the actual labels
21
of the test data.The overall accuracy score is printed to assess the performance of the model in
classifying fake news articles.
- Reflect on the effectiveness of the implemented models and suggest future research
directions.
- Emphasize the importance of continued efforts in developing robust fake news detection
systems.
FIG 3.6:Implementation
Economical Feasibility
Operational Feasibility
Technical Feasibility
22
Assessing the economic feasibility of a fake news classification project involves a
comprehensive analysis of costs, benefits, and risks. Development costs encompass data
collection, preprocessing, model development, and personnel expenses. Infrastructure
costs include hardware, software, and ongoing maintenance. On the benefits side, potential
savings from reduced misinformation-related damages, revenue opportunities, and
intangible benefits like societal well-being must be considered. Financial metrics such as
Net Present Value (NPV), Return on Investment (ROI), and Payback Period help quantify
the project's economic viability. Risk assessment identifies potential obstacles and informs
mitigation strategies. Ultimately, a thorough economic feasibility analysis guides decision-
making, ensuring that the project aligns with organizational goals and offers a positive
return on investment.
23
preprocessing, model development, and deployment. Additionally, the quality and
sufficiency of available data sources must be examined to ensure they meet the
requirements for training accurate models. It's imperative to assess various machine
learning algorithms and techniques to identify the most suitable approaches for fake news
classification. Consideration of computational and resource requirements, including
hardware infrastructure and scalability, is necessary to ensure efficient handling of large
volumes of data. Furthermore, evaluating integration with existing organizational systems
and workflows helps identify potential challenges and dependencies. By conducting a
thorough technical feasibility analysis, decision-makers can make informed choices and
address potential technological hurdles to ensure successful implementation of the fake
news classification system.
24
Chapter 4
SYSTEM SPECIFICATIONS
4.2SOFTWARE REQUIREMENTS
25
Chapter 5
SYSTEM DESIGN
GOALS
The Primary goals in the design of the UML are as follows:
26
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.
27
FIG:5.1Representation of Use Case Diagram
28
FIG:5.2 Representation of Class Diagram
29
FIG:5.3 Representation of Sequence Diagram
30
FIG:5.4 Representation of Data Flow Diagram
31
FIG:5.5 Representation of State Diagram
32
FIG:5.6 Representation of Activity Diagram
33
FIG:5.7 Representation of Collaboration Diagram
34
Chapter 6
SYSTEM IMPLEMENTATON
6.1 :
Technology
6.1.1:Python
It is used for:
Why Python
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
Python can be treated procedurally, an object-orientated way or a functional way.
Fake_news_detection.py:-
36
y = news['label']
App.py:-
#Importing the Libraries
import numpy as np
from flask import Flask, request,render_template
from flask_cors import CORS
import os
37
import joblib
import pickle
import flask
import os
import newspaper
from newspaper import Article
import urllib
@app.route('/')
def main():
return render_template('main.html')
#Receiving the input url from the user and using Web Scrapping to extract the news content
@app.route('/predict',methods=['GET','POST'])
def predict():
url =request.get_data(as_text=True)[5:]
url = urllib.parse.unquote(url)
article = Article(str(url))
article.download()
article.parse()
article.nlp()
news = article.summary
#Passing the news article to the model and returing whether it is Fake or Real
pred = model.predict([news])
return render_template('main.html', prediction_text='The news is "{}"'.format(pred[0]))
38
if name ==" main ":
port=int(os.environ.get('PORT',5000))
app.run(port=port,debug=True,use_reloader=False)
Main.html:-
<!DOCTYPE html>
<html >
<!--From https://codepen.io/frytyler/pen/EGdtg-->
<head>
<meta charset="UTF-8">
<title>Fake News Detection</title>
<link href='https://fonts.googleapis.com/css?family=Pacifico' rel='stylesheet'
type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Arimo' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Hind:300' rel='stylesheet'
type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300'
rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
</head>
<body>
<div class="login">
<h1>Predict Fake News</h1>
</form>
39
<br>
<br>
{{ prediction_text }}
</div>
</body>
</html>
https://fonts.googleapis.com/css?family=Pacifico:-
/* cyrillic-ext */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6K6MmTpA.woff2)
format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6D6MmTpA.woff2)
format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
40
/* vietnamese */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6I6MmTpA.woff2)
format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6J6MmTpA.woff2)
format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Pacifico';
font-style: normal;
font-weight: 400;
src: url(https://fonts.gstatic.com/s/pacifico/v22/FwZY7-Qmy14u9lezJ-6H6Mk.woff2)
format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
https://fonts.googleapis.com/css?family=Arimo:-
41
/* cyrillic-ext */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDRrBZ
QI.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcBBrBZ
QI.woff2) format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDBrBZ
QI.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {
42
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcAxrBZQ
I.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* hebrew */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcAhrBZQ
I.woff2) format('woff2');
unicode-range: U+0590-05FF, U+200C-2010, U+20AA, U+25CC, U+FB1D-FB4F;
}
/* vietnamese */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDxrBZQ
I.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Arimo';
43
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcDhrBZQ
I.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Arimo';
font-style: normal;
font-weight: 400;
src:
url(https://fonts.gstatic.com/s/arimo/v29/P5sfzZCDf9_T_3cV7NCUECyoxNk37cxcABrB.w
off2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
https://fonts.googleapis.com/css?family=Hind:300:-
/* devanagari */
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaER2SjQpf.woff2)
format('woff2');
unicode-range: U+0900-097F, U+1CD0-1CF9, U+200C-200D, U+20A8, U+20B9,
U+20F0, U+25CC, U+A830-A839, U+A8E0-A8FF, U+11B00-11B09;
}
/* latin-ext */
44
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaERKSjQpf.woff2)
format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Hind';
font-style: normal;
font-weight: 300;
src: url(https://fonts.gstatic.com/s/hind/v16/5aU19_a8oxmIfMJaERySjQ.woff2)
format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300:-
/* cyrillic-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDujMR6WR.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
45
/* cyrillic */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuHMR6WR.woff2) format('woff2');
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDunMR6WR.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDubMR6WR.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* vietnamese */
@font-face {
font-family: 'Open Sans Condensed';
46
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDurMR6WR.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuvMR6WR.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Open Sans Condensed';
font-style: normal;
font-weight: 300;
src:
url(https://fonts.gstatic.com/s/opensanscondensed/v23/z7NFdQDnbTkabZAIOl9il_O6KJj73
e7Ff1GhDuXMRw.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,U+02DA,
U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122, U+2191,
U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
Style.css:-
47
@import url(https://fonts.googleapis.com/css?family=Open+Sans);
.btn { display: inline-block; *display: inline; *zoom: 1; padding: 4px 10px 4px; margin-
bottom: 0; font-size: 13px; line-height: 18px; color: #333333; text-align: center;text-shadow:
0 1px 1px rgba(255, 255, 255, 0.75); vertical-align: middle; background-color: #f5f5f5;
background-image: -moz-linear-gradient(top, #ffffff, #e6e6e6); background-image: -ms-
linear-gradient(top, #ffffff, #e6e6e6); background-image: -webkit-gradient(linear, 0 0, 0
100%, from(#ffffff), to(#e6e6e6)); background-image: -webkit-linear-gradient(top, #ffffff,
#e6e6e6); background-image: -o-linear-gradient(top, #ffffff, #e6e6e6); background-image:
linear-gradient(top, #ffffff, #e6e6e6); background-repeat: repeat-x; filter:
progid:dximagetransform.microsoft.gradient(startColorstr=#ffffff, endColorstr=#e6e6e6,
GradientType=0); border-color: #e6e6e6 #e6e6e6 #e6e6e6; border-color: rgba(0, 0, 0, 0.1)
rgba(0, 0, 0, 0.1) rgba(0, 0, 0, 0.25); border: 1px solid #e6e6e6; -webkit-border-radius: 4px; -
moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0 rgba(255,
255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255,
255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2),
0 1px 2px rgba(0, 0, 0, 0.05); cursor: pointer; *margin-left: .3em; }
.btn:hover, .btn:active, .btn.active, .btn.disabled, .btn[disabled] { background-color:
#e6e6e6; }
.btn-large { padding: 9px 14px; font-size: 15px; line-height: normal; -webkit-border-radius:
5px; -moz-border-radius: 5px; border-radius: 5px; }
.btn:hover { color: #333333; text-decoration: none; background-color: #e6e6e6; background-
position: 0 -15px; -webkit-transition: background-position 0.1s linear; -moz-transition:
background-position 0.1s linear; -ms-transition: background-position 0.1s linear; -o-
transition: background-position 0.1s linear; transition: background-position 0.1s linear; }
.btn-primary, .btn-primary:hover { text-shadow: 0 -1px 0 rgba(0, 0, 0, 0.25); color: #ffffff; }
.btn-primary.active { color: rgba(255, 255, 255, 0.75); }
.btn-primary { background-color: #4a77d4; background-image: -moz-linear-gradient(top,
#6eb6de, #4a77d4); background-image: -ms-linear-gradient(top, #6eb6de, #4a77d4);
background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#6eb6de), to(#4a77d4));
background-image: -webkit-linear-gradient(top, #6eb6de, #4a77d4); background-image: -o-
linear-gradient(top, #6eb6de, #4a77d4); background-image: linear-gradient(top, #6eb6de,
#4a77d4); background-repeat: repeat-x; filter:
progid:dximagetransform.microsoft.gradient(startColorstr=#6eb6de, endColorstr=#4a77d4,
48
GradientType=0); border: 1px solid #3762bc; text-shadow: 1px 1px 1px rgba(0,0,0,0.4);
box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.5); }
.btn-primary:hover, .btn-primary:active, .btn-primary.active, .btn-primary.disabled, .btn-
primary[disabled] { filter: none; background-color: #4a77d4; }
.btn-block { width: 100%; display:block; }
body {
width: 100%;
height:100%;
font-family: 'Open Sans', sans-serif;
background: #092756;
color: #fff;
font-size: 18px;
text-align:center;
letter-spacing:1.2px;
background: -moz-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%),-moz-linear-gradient(top, rgba(57,173,219,.25) 0%,
rgba(42,60,87,.4) 100%), -moz-linear-gradient(-45deg, #670d10 0%, #092756 100%);
background: -webkit-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), -webkit-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -webkit-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -o-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), -o-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -o-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -ms-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
49
10%,rgba(138,114,76,0) 40%), -ms-linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -ms-linear-gradient(-45deg, #670d10 0%,#092756 100%);
background: -webkit-radial-gradient(0% 100%, ellipse cover, rgba(104,128,138,.4)
10%,rgba(138,114,76,0) 40%), linear-gradient(to bottom, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), linear-gradient(135deg, #670d10 0%,#092756 100%);
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#3E1D6D',
endColorstr='#092756',GradientType=1 );
}
.login {
position: absolute;
top: 40%;
left: 50%;
margin: -150px 0 0 -150px;
width:400px; height:400px;
}
input {
width: 100%;
margin-bottom: 10px;
background: rgba(0,0,0,0.3);
border: none;
outline: none;
padding: 10px;
font-size: 13px;
color: #fff;
text-shadow: 1px 1px 1px rgba(0,0,0,0.3);
50
border-radius: 4px;
https://fonts.googleapis.com/css?family=Open+Sans:-
/* cyrillic-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4taVIGxA.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, U+20B4, U+2DE0-2DFF, U+A640-A69F,
U+FE2E-FE2F;
}
/* cyrillic */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4kaVIGxA.woff2) format('woff2');
51
unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
/* greek-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4saVIGxA.woff2) format('woff2');
unicode-range: U+1F00-1FFF;
}
/* greek */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4jaVIGxA.woff2) format('woff2');
unicode-range: U+0370-0377, U+037A-037F, U+0384-038A, U+038C, U+038E-03A1,
U+03A3-03FF;
}
/* hebrew */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4iaVIGxA.woff2) format('woff2');
unicode-range: U+0590-05FF, U+200C-2010, U+20AA, U+25CC, U+FB1D-FB4F;
}
/* math */
52
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B5caVIGxA.woff2) format('woff2');
unicode-range: U+0302-0303, U+0305, U+0307-0308, U+0330, U+0391-03A1, U+03A3-
03A9, U+03B1-03C9, U+03D1, U+03D5-03D6, U+03F0-03F1, U+03F4-03F5, U+2034-
2037, U+2057, U+20D0-20DC, U+20E1, U+20E5-20EF, U+2102, U+210A-210E, U+2110-
2112, U+2115, U+2119-211D, U+2124, U+2128, U+212C-212D, U+212F-2131, U+2133-
2138, U+213C-2140, U+2145-2149, U+2190, U+2192, U+2194-21AE, U+21B0-21E5,
U+21F1-21F2, U+21F4-2211, U+2213-2214, U+2216-22FF, U+2308-230B, U+2310,
U+2319, U+231C-2321, U+2336-237A, U+237C, U+2395, U+239B-23B6, U+23D0,
U+23DC-23E1, U+2474-2475, U+25AF, U+25B3, U+25B7, U+25BD, U+25C1, U+25CA,
U+25CC, U+25FB, U+266D-266F, U+27C0-27FF, U+2900-2AFF, U+2B0E-2B11,
U+2B30-2B4C, U+2BFE, U+FF5B, U+FF5D, U+1D400-1D7FF, U+1EE00-1EEFF;
}
/* symbols */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B5OaVIGxA.woff2) format('woff2');
unicode-range: U+0001-000C, U+000E-001F, U+007F-009F, U+20DD-20E0, U+20E2-
20E4, U+2150-218F, U+2190, U+2192, U+2194-2199, U+21AF, U+21E6-21F0, U+21F3,
U+2218-2219, U+2299, U+22C4-22C6, U+2300-243F, U+2440-244A, U+2460-24FF,
U+25A0-27BF, U+2800-28FF, U+2921-2922, U+2981, U+29BF, U+29EB, U+2B00-2BFF,
U+4DC0-4DFF, U+FFF9-FFFB, U+10140-1018E, U+10190-1019C, U+101A0, U+101D0-
101FD, U+102E0-102FB, U+10E60-10E7E, U+1D2C0-1D2D3, U+1D2E0-1D37F,
U+1F000-1F0FF, U+1F100-1F1AD, U+1F1E6-1F1FF, U+1F30D-1F30F, U+1F315,
U+1F31C, U+1F31E, U+1F320-1F32C, U+1F336, U+1F378, U+1F37D, U+1F382,
53
U+1F393-1F39F, U+1F3A7-1F3A8, U+1F3AC-1F3AF, U+1F3C2, U+1F3C4-1F3C6,
U+1F3CA-1F3CE, U+1F3D4-1F3E0, U+1F3ED, U+1F3F1-1F3F3, U+1F3F5-1F3F7,
U+1F408, U+1F415, U+1F41F, U+1F426, U+1F43F, U+1F441-1F442, U+1F444,
U+1F446-1F449, U+1F44C-1F44E, U+1F453, U+1F46A, U+1F47D, U+1F4A3, U+1F4B0,
U+1F4B3, U+1F4B9, U+1F4BB, U+1F4BF, U+1F4C8-1F4CB, U+1F4D6, U+1F4DA,
U+1F4DF, U+1F4E3-1F4E6, U+1F4EA-1F4ED, U+1F4F7, U+1F4F9-1F4FB, U+1F4FD-
1F4FE, U+1F503, U+1F507-1F50B, U+1F50D, U+1F512-1F513, U+1F53E-1F54A,
U+1F54F-1F5FA, U+1F610, U+1F650-1F67F, U+1F687, U+1F68D, U+1F691, U+1F694,
U+1F698, U+1F6AD, U+1F6B2, U+1F6B9-1F6BA, U+1F6BC, U+1F6C6-1F6CF,
U+1F6D3-1F6D7, U+1F6E0-1F6EA, U+1F6F0-1F6F3, U+1F6F7-1F6FC, U+1F700-1F7FF,
U+1F800-1F80B, U+1F810-1F847, U+1F850-1F859, U+1F860-1F887, U+1F890-1F8AD,
U+1F8B0-1F8B1, U+1F900-1F90B, U+1F93B, U+1F946, U+1F984, U+1F996, U+1F9E9,
U+1FA00-1FA6F, U+1FA70-1FA7C, U+1FA80-1FA88, U+1FA90-1FABD, U+1FABF-
1FAC5, U+1FACE-1FADB, U+1FAE0-1FAE8, U+1FAF0-1FAF8, U+1FB00-1FBFF;
}
/* vietnamese */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4vaVIGxA.woff2) format('woff2');
unicode-range: U+0102-0103, U+0110-0111, U+0128-0129, U+0168-0169, U+01A0-01A1,
U+01AF-01B0, U+0300-0301, U+0303-0304, U+0308-0309, U+0323, U+0329, U+1EA0-
1EF9, U+20AB;
}
/* latin-ext */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
54
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4uaVIGxA.woff2) format('woff2');
unicode-range: U+0100-02AF, U+0304, U+0308, U+0329, U+1E00-1E9F, U+1EF2-1EFF,
U+2020, U+20A0-20AB, U+20AD-20C0, U+2113, U+2C60-2C7F, U+A720-A7FF;
}
/* latin */
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 400;
font-stretch: 100%;
src: url(https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-
UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsjZ0B4gaVI.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
U+02DA, U+02DC, U+0304, U+0308, U+0329, U+2000-206F, U+2074, U+20AC, U+2122,
U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}
55
process, application, and/or system configuration. Unit tests ensure that each unique path of
a business process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.
56
of the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.
6.4.6 BLACK BOX TESTING
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a test in which
the software under test is treated as a black box. you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
6.4.7 ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
6.4.8 TESTING RESULTS
All the test cases mentioned above passed successfully. No defects encountered.
During this testing, each module is tested individually, and the module interfaces
are verified for the consistency with design specification. All-important processing path are
tested for the expected results. All error handling paths are also tested.
57
6.5.2 INTEGRATION TESTING
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set of high
order tests are conducted. The main objective in this testing process is to take unit tested
modules and builds a program structure that has been dictated by design.
1) Top-Down Integration
2) Bottom-up Integration
This method begins the construction and testing with the modules at the lowest level
in the program structure. Since the modules are integrated from the bottom up, processing
required for modules subordinate to a given level is always available and the need for stubs
is eliminated. The bottom-up integration strategy may be implemented with the following
steps:
The low-level modules are combined into clusters into clusters that perform a specific
Software sub-function.
A driver (i.e.) the control program for testing is written to coordinate test case input and
output.
The cluster is tested.
Drivers are removed and clusters are combined moving upward in the program.
Structure
The bottom-up approach tests each module individually and then each module is module is
integrated with a main module and tested for functionality.
58
6.5.3 USER ACCEPTANCE TESTING
User Acceptance of a system is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in touch with
the prospective system users at the time of developing and making changes wherever
required. The system developed provides a friendly user interface that can easily be
understood even by a person who is new to the system.
VALIDATION TESTING
Validation Checking
Validation checks are performed on the following fields.
Text Field:
The text field can contain only the number of characters lesser than or equal
to its size. The text fields are alphanumeric in some tables and alphabetic in other
tables. Incorrect entry always flashes and error message.
59
Chapter 7
SCREEN SHOTS
60
Fig 7.2:- Input url
61
Chapter 8
CONCLUSION
62
Chapter 9
FUTURE SCOPE
The future scope for fake news classification is vast and evolving, given the increasing
sophistication of misinformation and the technologies used to propagate it. Here are some
potential avenues for future development in this field:
Multimodal Analysis: Fake news often involves various forms of media, including text,
images, videos, and audio. Future research can focus on developing multimodal
classification techniques that integrate information from multiple modalities to enhance
accuracy and robustness.
Domain-Specific Solutions: Fake news manifests differently across various domains, such
as politics, health, and finance. Tailoring classification models to specific domains can
improve their accuracy and effectiveness in detecting domain-specific misinformation.
63
User-Centric Approaches: Empowering users with tools to identify and verify the
credibility of information they encounter online is essential. Future research can focus on
designing user-friendly browser extensions, plugins, or mobile apps that provide real-time
feedback on the trustworthiness of news articles and social media posts.
Education and Media Literacy: Investing in education and media literacy initiatives is
fundamental for empowering individuals to critically evaluate information sources and
recognize misinformation. Future efforts should focus on integrating media literacy
training into school curricula and promoting digital literacy among the general public.
64
REFRENCES
65
social media,” Research and Politics, vol. 6, no. 2, Apr. 2019, doi:
10.1177/2053168019848554.
[14] S. M. Jones-Jang, T. Mortensen, and J. Liu, “Does Media Literacy Help Identifi cation
of Fake News? Information Literacy Helps, but Other Literacies Don’t,”American
Behavioral Scientist, vol. 65, no. 2, pp. 371–388, Feb. 2021, doi:
10.1177/0002764219869406.
[15] P. Machete and M. Turpin, “The Use of Critical Thinking to Identify Fake News: A
Systematic Literature Review,” Lecture Notes in Computer Science (including subseries
Lecture Notes in Artifi cial Intelligence and Lecture Notes in Bioinformatics), vol. 12067
LNCS, pp. 235–246, 2020, doi: 10.1007/978-3-030-45002-1_20.
[16]P. Goyal, S. Taterh, and A. Saxena, “Fake News Detection using Machine Learning: A
Review,” International Journal of Advanced Engineering, Management and Science
(IJAEMS), vol. 7, no. 3, pp. 2454–1311, 2021, doi: 10.22161/ijaems.
[17] X. Zhou, R. Zafarani, K. Shu, and H. Liu, “Fake News: Fundamental theories, detection
strategies and challenges,” WSDM 2019 - Proceedings of the 12th ACM International
Conference on Web Search and Data Mining, pp. 836–837, Jan. 2019, doi:
10.1145/3289600.3291382.
[18]. S. Hakak, W. Z. Khan, S. Bhattacharya, G. T. Reddy, and K. K. R. Choo, “Propagation
of Fake News on Social Media: Challenges and Opportunities,” Lecture Notes in Computer
Science (including subseries Lecture Notes in Artifi cial Intelligence and Lecture Notes in
Bioinformatics), vol. 12575 LNCS, pp. 345–353, 2020, doi: 10.1007/978-3-030-66046-8_28.
[19] H. Allcott and M. Gentzkow, “Social Media and Fake News in the 2016 Election,”
Journal of Economic
Perspectives, vol. 31, no. 2, pp. 211–36, Mar. 2017, doi: 10.1257/JEP.31.2.211.
[20] P. Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, “Fake News Detection using
Machine Learning,” ITM Web of Conferences, vol. 40, p. 03003, 2021, doi:
10.1051/itmconf/20214003003.
66
PAPER PUBLICATION REPORT
CERTIFICATES OF AUTHORS
PUBLISHED PAPER DOCUMENT
International Journal of Engineering Science and Advanced Technology (IJESAT)
Vol 24 Issue 05, MAY, 2024
Abstract— In today’s modern world, "fake news" has been a major concern, spreading like
wildfire through many platforms. This phenomenon not only undermines the credibility of
information but also misleads society. Nowadays, social media is the greatest means by which
fake news spreads all over the place. This can cause many problems such as defamation of
people and spreading news in favour of specific individuals. Fake news often targets the most
prominent, powerful, and influential people in society, aiming to tarnish their reputation. The
escalating impact of fake news knows no bounds. Fake news is often biased, favouring a
single person or a section of people in society for their personal benefits. To mitigate these
challenges and promote transparency, there is a need to reduce the spread of fake news.
Introducing a "Fake News Classifier using NLP" offers a promising solution to combat this
issue. By using machine learning algorithms, this classifier can effectively identify misleading
information as fake news, thereby contributing to awareness in society and reducing losses.
1. INTRODUCTION
Fake news primarily consists of mis leading information spread across the society,
creating turmoil. In this era, Information is all over and the number of people accessing the
information is increasing substantially. There should be awareness among users regarding
what type of information they are consuming - “is it real? or fake?”. Moreover, most of the
social media platforms allow users to share their views through stories, statuses, posts,
directly affecting the spread of news, which may often considered fake. One very famous
Social media platform, what’s App serves as a means for consistently sharing fake news
among its users through What’s App groups, Statuses, personal messages. If this sharing or
spreading of fake news reaches a significant number, there is a risk of people believing it,
leading to disorder.
One such recent example is the rumour of the ban on 10 rupees coin in India. There
was widespread news that 10 rupees coins in India were banned, thanks to social media,
which facilitated the rapid spread of this misinformation. Nobody was accepting 10 rupees
coins, causing concern among people in India about what to do with them. However, the
government did not announce any such ban on 10 rupees coin, it was simply a baseless
rumour. After confirmation from the Reserve Bank of India (RBI), people calmed down, and
acceptance of the 10 rupees coins resumed.
2.EXISTING SYSTEM
There are various models which exist for Real &Fake news Detection. The most
prevalent system consists of a model that detects fake news based on keywords as well as the
headlines, simultaneously.Passive Aggressive detects fake news using keyword analysis and
headline,addressing topic-specific tendencies and author behavior and it contains the
sentiment analysis.
3. LITERATURE SURVEY:
Yang et al.,[11] TI-CNN model is used for identifying fake news in social media,
which performed several methods with accuracy of 92.20%.The dataset collected before the
election was held in 2016 US presidential elections.
Patel et al., [12] introduce a Natural Language Processing technique with different
classifiers to detect whether the news is real or fake .Algorithms like SVM and KNN gave
results with an accuracy of 88.47% and 86.90% while K-means gave results with low
accuracy 40.37%.
Kulkarni et al., [5] Their work on the classifiers like Random Forest, Logistic
Regression, Decision Tree and Gradient Boosting Algorithms. Logistic Regression
accomplish the highest accuracy of 85.04%, followed by Random Forest with 84.50%
accuracy and Decision Tree achieve 80.20% ,while Gradient Boosting algorithm accomplish
the lowest accuracy of 77.44%.
Agudelo et al.,[10] Detecting False news using machine learning algorithms, natural
language processing, and Python programming. By using algorithms like Multinomial Naive
Bayes model,CountVectorizer and TF-IDF Vectorizer algorithms, we accomplish the high
accuracy of 88.1% and 84.8% on dataset consisting of over 10,000 news items.
4. PROPOSED SYSTEM
In this paper we are going to make use of Natural Language Techniques to overcome the
widespread of false news on the internet. Here we make use techniques to determine how the
Multinomial algorithm works on the given clip of information which can be given as input to
the system.
The approach used in this project is to first train the system and then add the news
information for which one needs to check if its reliable or not reliable as well as print the
accuracy of the algorithm performance on the news clip inserted by the respective reader.
4.1 : METHODOLOGY:
We choose the MultinomialNB Classifier because, it performs satisfactory with data
sets with high dimensionality and it’s mainly particular classifier when comes to the text
The dataset we’ll use for this project- we’ll call it news.csv. This dataset has a shape of
7796×4. The first column identifies the news, the second and third are the title and text, and
the fourth column has labels denoting whether the news is REAL or FAKE.
4.3 : IMPLEMENTATION
- Loading Data:The cleaned dataset news.csv containing text and label columnsand the shape
of the data set is 7796×4 and it contains attributes like title,text and label.
- Splitting Data:The dataset is splits in the ratio of 8:2 that means 80% of the data is training
purpose and 20% of the data is testing purpose.
- Creating Pipelene:We create a machine learning pipeleine that applis TF-IDF vectorization
to the text data to convert it into numerical features and then applies the Multinomial Naive
Bayes Classifier.
- Training the Model:Train the pipeline on training data(X_train and y_train).
- Predicting Labels:Using trained model we predict the labels for test data.
-Model Evaluation:Evaluate the performance using confusion matrix and classification report.
- Deployment:Deployment were held in the user interface using environment which allows
users to identify where the news is real or fake.
6.SAMPLE SCREENSHOTS:
In this below screenshot we see the front end page and we just fed the news url link then
it gives the news whether the news is fake or real.
In the below Screenshot we see that the link provided and it goes to NLP module and then the
newspaper3k is extracts the content and we apply the content to the machine learning model
and it goes to module by using flask.
7. CONCLUSION:
Fake news responsible for creating false and misleading information that greatly affect the
people and the event. This project explains what fake news are and what real news are by
using Natural Language Processing and Machine Learning model which is used for
classification. We use NLP for automatically predict and detect the news whether it is real or
fake news. In This project develop a web application for fake news classification using
Natural Language Processing techniques. We use flask for back-end purpose and we allows
the users to give input news article URLs. The application get content by using newspaper3k,
process it using a pre-trained ML model, and returns a classification result. The front -end
displays the result.
Overall, the project aims to oppose misinformation by providing the tool to identify whether
the news is real or fake potentially.
8. FUTURE SCOPE:
In future of fake news classification using nlp ,several chances for exploration and
enhancement. This could involve experimenting with different machine learning algorithms
or we can say that nlp techniques like feature engineering to improve model accuracy and
efficiency. There is chance to explore different sources, languages and types of news to make
your model more robust and adaptable to various contexts. Focusing on these points, they can
continue to evolve and making it more effective, reliable and valuable to users.
9.REFERENCES:
[1] Chowdhury, G. (2003) Natural language processing. Annual Review of Information Science and
Technology, 37. pp. 51-89. ISSN 0066-4200
[2] A. N. K. Movanita, "BIN: 60 Persen Konten Media Sosial adalah Informasi Hoaks (BIN: 60
percent of social media content ishoax)," 2018. [Online]. Available:
https://nasional.kompas.com/read/2018/03/15/0647555 1/bin-60- persen-konten-media- socialadalah-
informasi- hoaks.
[3] S. Kumar, R. Asthana, S. Upadhyay, N. Upreti, and M. Akbar, "Fake news detection using deep
learning models: A novel approach," Transactions on Emerging Telecommunications Technologies,
2019.
[4] K.-H. Choi, "A study on the effect of reading side tool with NLP skill on student chinese reading
performance," Master Thesis, Grad. Ins. Edu. Inf. and Meas., National Taichung Univ. of Edu.,
Taichung, Taiwan, 2015.
[5] P. Kulkarni, S. Karwande, R. Keskar, P. Kale, and S. Iyer, “Fake News Detection using Machine
Learning,” ITM Web of Conferences, vol. 40, p. 03003, 2021, doi: 10.1051/itmconf/20214003003.
[6] Raj Bridgelall Department of Transportation, Logistics, and Finance, College of Business, North
Dakota State University, Fargo, ND 58108, USA; [email protected]
[7] Marwan Omar, Soohyeon Choi, DaeHun Nyang, and David Mohaisen, 3 Jan 2022
[8] Fake news detector: NLP project by ishant juyal
(https://levelup.gitconnected.com/fake-news-detector-nlp-project-9d67e0177075)
[9]Aldwairi, M. and A. Alwahedi, Detecting Fake News in Social Media Networks.
[10] G. Agudelo, O. Parra, and J. Barón Velandia, “Raising a Model for Fake News Detection Using
Machine Learning in Python,” pp. 596–604, 2018, doi: 10.1007/978-3-030-02131-3_52ï.
[11] . Y. Yang et al., “TI-CNN: Convolutional Neural Networks for Fake News Detection,” Jun. 2018,
Accessed: Mar. 24, 2023. [Online]. Available: https://arxiv.org/abs/1806.00749v3.
[12] J. Patel, M. Barreto, U. Sahakari, and Dr. S. Patil, “Fake News Detection with Machine Learning,”
International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 1, pp. 124–
127, Nov. 2020, doi: 10.35940/IJITEE.A8090.1110120
[13]H. Allcott, M. Gentzkow, and C. Yu, “Trends in the diffusion of misinformation on social media,”
Research and Politics, vol. 6, no. 2, Apr. 2019, doi: 10.1177/2053168019848554.
[14] S. M. Jones-Jang, T. Mortensen, and J. Liu, “Does Media Literacy Help Identifi cation of Fake
News? Information Literacy Helps, but Other Literacies Don’t,”American Behavioral Scientist, vol.
65, no. 2, pp. 371–388, Feb. 2021, doi: 10.1177/0002764219869406.
[15] P. Machete and M. Turpin, “The Use of Critical Thinking to Identify Fake News: A Systematic
Literature Review,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artifi
cial Intelligence and Lecture Notes in Bioinformatics), vol. 12067 LNCS, pp. 235–246, 2020, doi:
10.1007/978-3-030-45002-1_20.