Fake News Detection Using Machine Learning: Project Report On
Fake News Detection Using Machine Learning: Project Report On
Batch (2021-2024)
Submi ed by
Vedanshi Tomar
Enrollment Number:
2200716
i
CANDIDATE’S DECLARATION
I hereby certify that the work presented in this project report entitled
“ ” in partial
fulfilment of the requirements for the award of the degree of Bachelor of Computer
Applications is a bonafide work carried out by me during the period of November 2023
January 2024 under the supervision of , Department of Computer
Application, Graphic Era Deemed to be University, Dehradun, India.
This work has not been submitted elsewhere for the award of a
degree/diploma/certificate.
This is to certify that the above mentioned statement in the candidate’s declaration is
correct to the best of my knowledge.
iv
CERTIFICATE OF ORIGINALITY
This is to certify that the project report entitled
submitted to Graphic Era University, Dehradun in partial fulfilment of the requirement for
the award of the degree of BACHELOR OF COMPUTER APPLICATIONS , is an
authentic and original work carried out by Mr. / Ms. with
enrolment number under my supervision and guidance.
The matter embodied in this project is genuine work done by the student and has not been
submitted whether to this University or to any other University / Institute for the fulfilment of
the requirements of any course of study.
…………………………. ………………………….
Special Note:
v
ABSTRACT
vi
TITLE PAGENO
ABSTRACT V
LIST OF FIGURES IX
1. INTRODUCTION 1
1.1 MOTIVATION 1
1.2 OBJECTIVES 2
2. LITERATURE SURVEY 3
3. METHODOLOGY 7
3.3.1 PYTHON 10
3.5 MODULES 19
3.6 ALGORITHMS 21
REFERENCES 32
APPENDIX 34
A. SOURCE CODE 34
B. SCREENSHOTS 35
LIST OF FIGURES
3.3 DASHBOARD 16
These days‟ fake news is creating different issues from sarcastic articles to a
fabricated news and plan government propaganda in some outlets. Fake news
and lack of trust in the media are growing problems with huge ramifications in
our society. Obviously, a purposely misleading story is “fake news “ but lately
blathering social media’s discourse is changing its definition. Some of them now
use the term to dismiss the facts counter to their preferred viewpoints.
MOTIVATION
We will be training and testing the data, when we use supervised learning it
means we are labeling the data. By getting the testing and training data and
labels we can perform different machine learning algorithms but before
performing the predictions and accuracies, the data is need to be preprocessing
i.e. the null values which arenot readable are required to be removed from the
data set and the data is required to
1
be converted into vectors by normalizing and tokening the data so that it could
be understood by the machine. Next step is by using this data, getting the
visual reports, which we will get by using the Mat Plot Library of Python and
Sickit Learn. This library helps us in getting the results in the form of histograms,
pie charts or bar charts.
OBJECTIVE
OVERVIEW OF PROJECT
In general, the goal is profiting through clickbaits. Clickbaits lure users and
entice curiosity with flashy headlines or designs to click links to increase
advertisements revenues. This exposition analyzes the prevalence of fake
news in light of the advances in communication made possible by the
emergence of social networking sites. The purpose of the work is to come up
with a solution that can be utilized by users to detect and filter out sites
containing false and misleading information. We use simple and carefully
selected features of the title and post to accurately identify fake posts. The
experimental results show a 99.4% accuracy using logistic classifier.
Fake news and hoaxes have been there since before the advent of the Internet.
The widely accepted definition of Internet fake news is fictitious articles
deliberately fabricated to deceive readers”. Social media and news outlets
publish fake news to increase readership or as part of psychological warfare.
Ingeneral, the goal is profiting through clickbaits. Clickbaits lure users and
entice curiosity with flashy headlines or designs to click links to increase
advertisements revenues. This exposition analyzes the prevalence of fake
news in light of the advances in communication made possible by the
emergence of social networking sites. The purpose of the work is to come up
with a solution that can be utilized by users to detect and filter out sites
containing false and misleading information. We use simple and carefully
selected features of the title and post to accurately identify fake posts. The
experimental results show a 99.4% accuracy using logistic classifier.
The proliferation and rapid diffusion of fake news on the Internet highlight the
need of automatic hoax detection systems. In the context of social networks,
machine learning (ML) methods can be used for this purpose. Fake news
detection strategies are traditionally either based on content analysis (i.e.,
analyzing the content of the news) or - more recently - on social context models,
4
such as mapping the news‟ diffusion pattern. In this paper, we first propose a
novel ML fake news detection method which, by combining news content and
social context features, outperforms existing methods in the literature,
increasing their already high accuracy by up to 4.8%. Second, we implement
our method within a Facebook Messenger chatbot and validate it with a real-
world application, obtaining a fake news detection accuracy of 81.7%. In recent
years, the reliability of information on the Internet has emerged as a crucial
issue of modern society. Social network sites (SNSs) have revolutionized the
way in which information is spread by allowing users to freely share content. As
a consequence, SNSs are also increasingly used as vectors for the diffusion of
misinformation and hoaxes. The amount of disseminated information and the
rapidity of its diffusion make it practically impossible to assess reliability in a
timely manner, highlighting the need for automatic hoax detection systems. As
a contribution towards this objective, we show that Facebook posts can be
classified with high accuracy as hoaxes or non-hoaxes on the basis of the users
who "liked" them. We present two classification techniques, one based on
logistic regression, the other on a novel adaptation of Boolean crowdsourcing
algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236
users, we obtain classification accuracies exceeding 99% even when the
training set contains less than 1% of the posts. We further show that our
techniques are robust: they work even when we restrict our attention to the
users who like both hoax and non-hoax posts. These results suggest that
mapping the diffusion pattern of information can be a useful component of
automatic hoax detection systems.
5
THE SPREAD OF FAKE NEWS BY SOCIAL BOTS
The massive spread of fake news has been identified as a major global risk and
has been alleged to influence elections and threaten democracies.
Communication, cognitive, social, and computer scientists are engaged in
efforts to study the complex causes for the viral diffusion of digital
misinformation and to develop solutions, while search and social media
platforms are beginning to deploy countermeasures. However, to date, these
efforts have been mainly informed by anecdotal evidence rather than
systematic data. Here we analyze 14 million messages spreading 400.
thousand claims on Twitter during and following the 2016 U.S. presidential
campaign and election. We find evidence that social bots play a key role in the
spread of fake news. Accounts that actively spread misinformation are
significantly more likely to be bots. Automated accounts are particularly active
in the early spreading phases of viral claims and tend to target influential
users. Humans are vulnerable to this manipulation, retweeting bots who post
false news. Successful sources of false and biased claims are heavily
supported by social bots. These results suggest that curbing social bots may
be an effective strategy for mitigating the spread of online misinformation.
6
MISLEADING ONLINE CONTENT
Big Data Analytics and Deep Learning are two high-focus of data science. Big
Data has become important as many organizations both public and private have
been collecting massive amounts of domain-specific information, which can
contain useful information about problems such as national intelligence, cyber
security, fraud detection, marketing, and medical informatics. Companies such
as Google and Microsoft are analyzing large volumes of data for business
analysis and decisions, impacting existing and future technology. Deep
Learning algorithms extract high- level, complex abstractions as data
representations through a hierarchical learning.
7
process. Complex abstractions are learnt at a given level based on relatively
simpler abstractions formulated in the preceding level in the hierarchy. A key
benefit of Deep Learning is the analysis and learning of massive amounts of
unsupervised data, making it a valuable tool for Big Data Analytics where raw
data is largely unlabeled and uncategorized. In the present study, we explore
how Deep Learning can be utilized for addressing some important problems in
Big Data Analytics, including extracting complex patterns from massive
volumes of data, semantic indexing, data tagging, fast information retrieval, and
simplifying discriminative tasks. We also investigate some aspects of Deep
Learning research that need further exploration to incorporate specific
challenges introduced by Big Data Analytics, including streaming data, high-
dimensional data, scalability of models, and distributed computing. We
conclude by presenting insights into relevant future works by posing some
questions, including defining data sampling criteria, domain adaptation
modeling, defining criteria for obtaining useful data abstractions, improving
semantic indexing, semi - supervised learning, and active learning
8
CHAPTER 3 METHODOLOGY
EXISTING SYSTEM
There exists a large body of research on the topic of machine learning methods
for deception detection, most of it has been focusing on classifying online
reviews and publicly available social media posts. Particularly since late 2016
during the American Presidential election, the question of determining 'fake
news' has also been the subject of particular attention within the literature.
Conroy, Rubin, and Chen outline several approaches that seem promising
towards the aim of perfectly classify the misleading articles. They note that
simple content-related n-grams and shallow parts-of-speech tagging have
proven insufficient for the classification task, often failing to account for
important context information. Rather, these methods have been shown
useful only in tandem with more complex methods of analysis. Deep Syntax
analysis using Probabilistic Context Free Grammars have been shown to be
particularly valuable in combination with n-gram methods. Feng, Banerjee, and
Choi are able to achieve 85%-91% accuracy in deception related classification
tasks using online review corpora.
PROPOSED SYSTEM
In this paper a model is build based on the count vectorizer or a tfidf matrix (
i.e ) word tallies relatives to how often they are used in other artices in your
dataset ) can help . Since this problem is a kind of text classification,
Implementing a Naive Bayes classifier will be best as this is standard for text-
based processing. The actual goal is in developing a model which was the text
transformation (count vectorizer vs tfidf vectorizer) and choosing which type of
text to use (headlines vs full text). Now the next step is to extract the most
optimal features for countvectorizer or tfidf-vectorizer, this is done by using a
9
n-number of the most used words, and/or phrases, lower casing or not,
mainly removing the stop words which are common words such as “the”,
“when”, and “there” and only using those words that appear at least a given
number of times in a given text dataset.
10
SYSTEM ARCHITECTURE
11
SYSTEM REQUIREMENTS HARDWARE REQUIREMENTS:
➢ System - Pentium-IV
➢ Speed - 2.4GHZ
➢ Hard disk - 40GB
➢ Monitor - 15VGA color
➢ RAM - 512MB
SOFTWARE REQUIREMENTS:
SOFTWARE ENVIRONMENTPYTHON
12
➢ Python is Interpreted − Python is processed at runtime by the interpreter. You
do not need to compile your program before executing it. This is similar to PERL
and PHP.
➢ Python is Interactive − You can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
History of Python
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer
Science in the Netherlands.
Python is copyrighted. Like Perl, Python source code is now available under
the GNU General Public License (GPL).
PYTHON FEATURES
➢ Easy-to-read − Python code is more clearly defined and visible to the eyes.
13
➢ Easy-to-maintain − Python's source code is fairly easy-to-maintain.
➢ A broad standard library − Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
➢ Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
➢ Portable − Python can run on a wide variety of hardware platforms and has
the same interface on all platforms.
➢ Extendable − You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
➢ GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.
➢ Scalable − Python provides a better structure and support for large programs
than shell scripting.
Apart from the above-mentioned features, Python has a big list of good features,
few are listed below −
• It provides very high-level dynamic data types and supports dynamic type
checking.
Getting Python
The most up-to-date and current source code, binaries, documentation, news,
etc., is available on the official website of Python https://www.python.org.
Windows Installation
[2] Follow the link for the Windows installer python-XYZ.msifile where XYZ is the
version you need to install.
[3] To use this installer python-XYZ.msi, the Windows system must support
Microsoft Installer 2.0. Save the installer file to your local machine and then
run it to find out if your machine supports MSI.
[4] Run the downloaded file. This brings up the Python install wizard, which is
really easy to use. Just accept the default settings, wait until the install is
finished, and you are done.
The Python language has many similarities to Perl, C, and Java. However,
there are some definite differences between the languages.
$ python
15
Python2.4.3(#1,Nov112010,13:34:43)
[GCC 4.1.220080704(RedHat4.1.2-48)] on linux2
>>>
Type the following text at the Python prompt and press the Enter −
>>>print"Hello, Python!"
If you are running new version of Python, then you would need to use print
statement with parenthesis as in print ("Hello, Python!");. However in Python
version 2.4.3, this produces the following result −
Hello, Python!
Invoking the interpreter with a script parameter begins execution of the script
and continues until the script is finished. When the script is finished, the
interpreter is no longer active.
Let us write a simple Python program in a script. Python files have extension
.py. Type the following source code in a test.py file −
print"Hello, Python!"
We assume that you have Python interpreter set in PATH variable. Now, try to
run this program as follows −
$ python test.py
This produces the following result −
Hello, Python!
16
FLASK FRAMEWORK
1 GET
2 HEAD
3 POST
Used to send HTML form data to server. Data received by POST method is not cached by server.
4 PUT
In order to demonstrate the use of POST method in URL routing, first let us
create an HTML form and use the POST method to send form data to a URL.
<html>
<body>
<formaction="http://localhost:5000/login"method="post">
<p>Enter Name:</p>
<p><inputtype="text"name="nm"/></p>
<p><inputtype="submit" value="submit"/></p>
</form>
</body>
</html>
18
app=Flask( name )
@app.route('/success/<name>')
def success(name):
@app.route('/login',methods=['POST','GET'])
def login():
ifrequest.method=='POST':
user=request.form['nm']
else:
user=request.args.get('nm')
app.run(debug =True)
After the development server starts running, open login.html in the browser,
enter name in the text field and click Submit
19
Fig:3.2 Login page
user = request.form['nm']
It is passed to „/success‟ URL as variable part. The browser displays a
welcome
message in the window.
20
Fig:3.3 Dashboard
Change the method parameter to „GET‟ in login.html and open it again in the
browser. The data received on server is by the GET method. The value of „nm‟
parameter is now obtained by −
User = request.args.get(„nm‟)
Here, args is dictionary object containing a list of pairs of form parameter and
its corresponding value. The value corresponding to „nm‟ parameter is
passed on to
MODULES
A. Data Use
B. Preprocessing
C. Feature Extraction 21
A. Data Use
So, in this project we are using different packages and to load and read the data
set we are using pandas. By using pandas, we can read the .csv file and then
we can display the shape of the dataset with that we can also display the
dataset in the correct form. We will be training and testing the data, when we
use supervised learning it means we are labeling the data. By getting the
testing and training data and labels we can perform different machine learning
algorithms but before performing the predictions and accuracies, the data is
need to be preprocessing i.e. the null values which are not readable are
required to be removed from the data set and the data is required to be
converted into vectors by normalizing and tokening the data so that it could be
understood by the machine. Next step is by using this data, getting the visual
reports, which we will get by using the Mat Plot Library of Python and Sickit
Learn. This library helps us in getting the results in the form of histograms, pie
charts or bar charts.
B. Preprocessing
The data set used is split into a training set and a testing set containing in
Dataset 1
-3256 training data and 814 testing data and in Dataset II- 1882 training
data and 471 testing data respectively. Cleaning the data is always the first
step. In this, those words are removed from the dataset. That helps in mining
the useful information. Whenever we collect data online, it sometimes contains
the undesirable characters like stop words, digits etc. which creates hindrance
while spam detection. It helps in removing the texts which are language
independent entities and integrate the logic which can improve the accuracy of
22
the identification task.
C. Feature Extraction
23
Algorithms
Naive Bayes
• SVM‟s are a set of supervised learning methods used for classification, and
regression.
Logistic Regression
• Algorithm‟s accuracy depends on the type and size of your dataset. More the
data, more chances of getting correct accuracy.
24
• Machine learning depends on the variations and relations
• Understanding what is predictable is as important as trying to predict it.
• While making algorithm choice , speed should be a consideration factor.
REQUIREMENT ANALYSIS
Requirement analysis, also called requirement engineering, is the
process of determining user expectations for a new modified product. It
encompasses the tasks that determine the need for analysing, documenting,
validating and managing software or system requirements. The requirements
should be documentable, actionable, measurable, testable and traceable
related to identified business needs or opportunities and define to a level of
detail, sufficient for system design.
FUNCTIONAL REQUIREMENTS
It is a technical specification requirement for the software products. It
is the first step in the requirement analysis process which lists the requirements
of particular software systems including functional, performance and security
requirements. The function of the system depends mainly on the quality
hardware used to run the software with given functionality.
Usability
It specifies how easy the system must be use. It is easy to ask queries in
any format which is short or long, porter stemming algorithm stimulates the
desired response for user.
Robustness
Security
Reliability
Compatibility
It is supported by version above all web browsers. Using any web servers
like localhost makes the system real-time experience.
Flexibility
The flexibility of the project is provided in such a way that is has the ability to
run on different environments being executed by different users.
Safety
26
NON- FUNCTIONAL REQUIREMENTS
Portability
It is the usability of the same software in different environments. The
project can be run in any operating system.
Performance
These requirements determine the resources required, time interval,
throughput and everything that deals with the performance of the system.
Accuracy
The result of the requesting query is very accurate and high speed of
retrieving information. The degree of security provided by the system is high
and effective.
Maintainability
Project is simple as further updates can be easily done without affecting
its stability. Maintainability basically defines that how easy it is to maintain the
system. It means that how easy it is to maintain the system, analyse, change
and test the application. Maintainability of this project is simple as further
updates can be easily done without affecting its stability.
The input design is the link between the information system and the user.
It comprises the developing specification and procedures for data preparation
and those steps are necessary to put transaction data in to a usable form for
processing can be achieved by inspecting the computer to read data from a
written or printed document or it can occur by having people keying the data
directly into the system. The design of input focuses on controlling the amount
of input required, controlling the errors, avoiding delay, avoiding extra steps
and keeping the process simple. The input is designed in such a way so that it
provides security and ease of use with retaining the privacy. Input Design
27
considered the following things:
[12] What data should be given as input?
[13] How the data should be arranged or coded?
[14] The dialog to guide the operating personnel in providing input.
[15] Methods for preparing input validations and steps to follow when error occur.
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user
and presents the information clearly. In any system results of processing are
communicated to the users and to other system through outputs. In output
design it is determined how the information is to be displaced for immediate
need and also the hard copy output. It is the most important and direct source
information to the user. Efficient and intelligent output design improves the
system‟s relationship to help user decision-making.
The output form of an information system should accomplish one or more of the
following objectives.
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the
methods that are employed to educate the user about the system and to make
him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user
of the system.
28
DATA FLOW DIAGRAM
• The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
• The data flow diagram (DFD) is one of the most important modeling tools. It is
used to model the system components. These components are the system
process, the data used by the process, an external entity that interacts with
the system and the information flows in the system.
• DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.
• DFD is also known as bubble chart. A DFD may be used to represent a system
at any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
• It is the testing of individual software units of the application .it is done after
the completion of an individual unit before integration.
29
Fig:4.1 Data Flow Diagram
30
UNIT TESTING
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is
the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing,
that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.
INTEGRATION TESTING
Integration tests are designed to test integrated software components to
determine if they actually run as one program. Testing is event driven and is
more concerned with the basic outcome of screens or fields. Integration tests
demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct
and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.
FUNCTIONAL TESTING
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.
Valid Input: identified classes of valid input must be accepted. Invalid Input:
must be exercised.
ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets
the functional requirements.
32
CHAPTER 5 CONCLUSION AND FUTURE WORK
Many people consume news from social media instead of traditional news
media. However, social media has also been used to spread fake news, which
has negative impacts on individual people and society. In this paper, an
innovative model for fake news detection using machine learning algorithms
has been presented. This model takes news events as an input and based on
twitter reviews and classification algorithms it predicts the percentage of news
being fake or real.
The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.This study is carried out to check the
economic impact that the system will have on the organization. The amount of
fund that the company can pour into the research and development of the
system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be
purchased.
33
REFERENCES
[1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection:
A Survey. In 2018 IEEE Conference on Multimedia Information Processing and
Retrieval (MIPR) (pp. 436-441). IEEE.
[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic
deception detection: Methods for finding fake news. In Proceedings of the 78th
ASIS&T Annual Meeting: Information Science with Impact: Research in and
for the Community (p. 82). American Society for Information Science.
[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M.,
& de Alfaro, L. (2018, May). Automatic Online Fake News Detection
Combining Content and Social Signals. In 2018 22nd Conference of Open
Innovations Association (FRUCT) (pp. 272-279). IEEE.
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L.
(2017). Some like it hoax: Automated fake news detection in social networks.
arXiv preprint arXiv:1704.07506.
[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017).
The spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-
104.
34
[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online
content: Recognizing clickbait as false news. In Proceedings of the
[9]. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R.,
& Muharemagic, E. (2015). Deep learning applications and challenges in big
data analytics. Journal of Big Data, 2(1), 1.
[10]. Haiden, L., & Althuis, J. (2018). The Definitional Challenges of Fake News.
35
APPENDIX
A) SOURCE CODE
import numpy as np
import pandas as pd
import os
import nltk
nltk.download('all')
# New Section
real = pd.read_csv("/content/drive/MyDrive/dataset/True.csv")
fake = pd.read_csv("/content/drive/MyDrive/dataset/Fake.csv")
real.head()
fake.head()
real['label'] = 1
fake['label'] = 0
df = pd.concat([real,fake])
df.shape
36
df1 = df
plt.figure(figsize=(8,4))
# sns.countplot(df.label)
sns.countplot(x='label', data=df)
plt.title('Total Fake and Real News Articles', fontsize=24)
plt.ylabel('Total', fontsize=16)
plt.xlabel('')
plt.xticks([0, 1], ['Fake', 'Real'], fontsize=16)
plt.show()
df.isnull().sum()
df.columns
plt.figure(figsize=(8,8))
sns.set_style("dark")
chart = sns.countplot(x='label',hue ="subject", data = df,palette='muted')
chart.set_xticklabels(chart.get_xticklabels(),rotation=90)
# svm Classifier
('tfidf', TfidfTransformer()),
('clf', LinearSVC())
])
39
fig, ax = plot_confusion_matrix(conf_mat=confusion_matrix(y_test, prediction),
show_absolute=True,
show_normed=True,
colorbar=True)
plt.show()
# Multinomial NB
pipe = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB())
])
40
B) SCREENSHOTS
41
Fig: 5.2 Checking statement with Dataset
42
Fig:5.3 Detecting Fake News using Dataset
43
Fig:5.4 Word cloud of words used in Fake News
44
Fig:5.5 Word cloud of words used in Genuine News
45
Fig:5.6 Accuracy level of Algorithms with dataset (svm Classifier)
46
Fig:5.7 Accuracy level of Algorithms with dataset (Passive Aggressive
Classifier)
47
Fig:5.8 Accuracy level of Algorithms with dataset (Multinomial NB)
48