Machine Learning for Spam Detection
Machine Learning for Spam Detection
MACHINE LEARNING
PROJECT REPORT
Submitted by
DEEPIKA A. (18BCS026)
RAHUL R. (18BCS076)
NARMADHA M. (18BCS014)
of
Bachelor of Engineering
in
Computer Science and Engineering
MAY 2022
[Link] COLLEGE OF ENGINEERING
AND TECHNOLOGY, POLLACHI -642003
(An Autonomous Institution Affiliated to Anna University, Chennai)
BONAFIDE CERTIFICATE
DEEPIKA A. (18BCS026)
RAHUL R. (18BCS076)
NARMADHA M (18BCS014)
[Link]
SUPERVISOR Dr. [Link]
Assistant Professor HEAD OF THE DEPARTMENT
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Dr. Mahalingam College of Engineering and Dr. Mahalingam College of Engineering and
Technology, Pollachi – 642003 Technology, Pollachi – 642003
Submitted for the Autonomous End Semester Examination Project Viva-voce held
on
ABSTRACT
Nowadays, a big part of people rely on available email or messages sent by the stranger. The
possibility that anybody can leave an email or a message provides a golden opportunity for
spammers to write spam message about our different [Link] fills inbox with number of
ridiculous emails. Degrades our internet speed to a great extent .Steals useful information like our
details on our contact list. Identifying these spammers and also the spam content can be a hot topic
of research and laborious tasks. Email spam is an operation to send messages in bulk by mail
.Since the expense of the spam is borne mostly by the recipient ,it is effectively postage due
advertising. Spam email is a kind of commercial advertising which is economically viable because
email could be a very cost effective medium for sender. With this proposed model the specified
message can be stated as spam or not using Bayes’ theorem and Naive Bayes’ Classifier.
In Naive Bayes' Rule, we want to find the probability an email is spam, given it contains certain
words. We do this by finding the probability that each word in the email is spam, and then multiply
these probabilities together to get the overall email spam metric to be used in classification.
i
ACKNOWLEDGEMENT
First and foremost, we wish to express our deep unfathomable feeling, gratitude to our
institution and our department for providing us a chance to fulfill our long cherished dreams
of becoming Computer Science Engineers.
We wish to express our hearty thanks to Dr. A. Rathinavelu, Principal of our college, for
his constant motivation and continual encouragement regarding our project work.
We are grateful to Dr. G. Anupriya, Head of the Department, Computer Science and
Engineering, for her direction delivered at all times required. We also thank her for her
tireless and meticulous efforts in bringing out this project to its logical conclusion.
Our hearty thanks to our guide Mrs. G. Gowri, Assistant Professor for his constant support
and guidance offered to us during the course of our project by being one among us and all
the noble hearts that gave us immense encouragement towards the completion of our
project.
We also thank our review panel members Dr. A. Noble Mary Juliet, Associate Professor
and Mrs. N. Sumathi, Assistant Professor for their continuous support and guidance.
ii
TABLE OF CONTENTS
ABSTRACT i
ACKNOWLEDGEMENT ii
LIST OF ABBREVIATIONS v
LIST OF FIGURES vi
1 INTRODUCTION 1
1.1 Objective 1
1.2 Overview 2
2 LITERATURE SURVEY 3
2.1 Opinion rank 3
2.7 Summary 8
3 METHODOLOGY 9
3.1 Naive Bayes spam filtering 9
3.2 Email dataset 9
iii
3.3.3 Removal of stop words 11
4 RESULTS 13
5 CONCLUSION 14
REFERENCES 15
APPENDIX A A1
APPENDIX B B1
ONLINE CERTIFICATION 27
iv
LIST OF ABBREVIATIONS
v
LIST OF FIGURES
FIGURE TITLE PAGE
No. No.
2.1 Standard spam filtering 7
2.2 Client side and enterprise level spam filtering 8
3.1 Sample dataset 10
3.2 EDA 12
4.1 Model Accuracy graph 13
B.1 Local host deployment B.1
B.2 Sample output Spam B.1
B.3 Sample output not spam B.2
B.4 cloud deployment B.2
vi
CHAPTER 1
INTRODUCTION
In recent years, internet has become an integral part of life. With increased use of internet,
numbers of email users are increasing day by day. This increasing use of email has created problems
caused by unsolicited bulk email messages commonly referred to as Spam. Email has now become
one of the best ways for advertisements due to which spam emails are generated. Spam emails are
the emails that the receiver does not wish to receive. a large number of identical messages are sent
to several recipients of email. Spam usually arises as a result of giving out our email address on an
unauthorized or unscrupulous website .There are many of the effects of Spam .Fills our Inbox with
number of ridiculous emails. Degrades our Internet speed to a great extent .Steals useful information
like our details on you Contact list .Alters your search results on any computer program .Spam is a
huge waste of everybody’s time and can quickly become very frustrating if you receive large
amounts of it .Identifying these spammers and the spam content isa laborious task . even though
extensive number of studies have been done, yet so far the methods set forth still scarcely
distinguish spam surveys, and none of them demonstrate the benefits of each removed element
compose .In spite of increasing network communication and wasting lot of memory space ,spam
messages are also used for some attack . Spam emails, also known as non-self, are unsolicited
commercial or malicious emails, sent to affect either a single individual or a corporation or a bunch
of people. Besides advertising, these may contain links to phishing or malware hosting websites
found out to steal confidential information. to solve this problem the different spam filtering
techniques are used. The spam filtering techniques areaccustomed protect our mailbox for spam
mails.
1.1 Objective
• To Implement precision spam messages/e-mail filter using machine learning that uses
Collected datasets from internet with help of datasets it filters messages/e-mail.
• To classify the received messages/e-mail using NLTK vectorizer.
1
1.2 Overview
The complete organized synopsis is as follows. Chapter 1 explains about the introduction and
objectives of project Chapter 2 reflects about the existing literature on this domain and brief
summary about the survey. Chapter 3 briefs about the Methodology. Chapter 4 explains results
of data sets. Chapter 5 presents about the conclusion of this project.
2
CHAPTER 2
LITERATURE SURVEY
This chapter provides a brief insight on the related works of email spam detection and mail data
optimization. Summary of various methodologies have also been discussed.
In this paper, Luo GuangJun et al. (2020) proposed the applications of machine learning
based-spam detection for accurate detection of spams. For classificationof spam and ham messages
in mobile device communications they have used the Logistic Regression, K-nearest neighbor and
3
Detection Tree. The collection of SMS dataset is used for testing the methods. And the dataset is
splitted into two sets as one is for testing and another one is for training. And 70 percent of data
is used for training purposes and 30 percent is used for testing purposes. The Logistic Regression
is a classifier which computes the predictive y in the problem of binary classification as 0 or 1
such that it belongs to class negative or class positive. It predicts values for the variable in multi
classification. The Decision Tree is a supervised machine learning algorithm which is like the
shape of a tree at which each node is a decision node or leaf node. In this tree the nodes are
interlinked with each other. The K-nearest neighbour classification is also a supervised learning
algorithm but this performance is not goodenough.
In this Survey, Nandhini et al. (2018) proposed a machine learning model based on a
hybrid bagging [Link] implementing with the help of two machine algorithms for detecting
the spam emails. Namely, Naive Bayes algorithm and J48 (Decision tree) algorithm. In this
process of detecting the spam mails, the dataset is divided into different sets and given as input to
each of the [Link], they performed three experiments in this paper. The first experiment
is performed with the Naive Bayes algorithm. It is a classifier based on the probability and it
computes the probabilities of the class of the given instances. And the second experiment is
performed with the J48 Decision tree algorithm. It is based on the concept of entropy and it forms
the decision trees of the training data. The third experiment is the proposed Spam Mail Detection
(SMD) system by using the hybrid bagged approach which is the combination of J48 algorithm
and Naive Bayes Multinomial classifier. It classifies the email into spam mails and ham mails. It
consists of four modules which are preparation of email dataset, pre- processing of data, feature
selection and hybrid bagged approach. Only the J48 algorithm gives the experimental results
better. Other two experiment gives low performance. To enhance the system's performance by
using the concept of boosting approach. It will replace the features of weak classifier learning
features with a strong classifier's approach.
4
2.4 Email Spam Detection Using Integrated Naive Bayes approach
From this work, Kaur et al. (2018) have proposed a machine learning model by integrating
the Naive Bayes algorithm and intelligence-based Particle Swarm Optimization which is used for
detecting spam mail. The Naive Bayes algorithm is based on the Bayes theorem which has a strong
probability distribution property. And the Particle Swarm Optimization is inspired from the
behaviour of the fishes and the birds. The Naive Bayes algorithm determines the mail as spam class
and non-spam class based on the keywords present on the email data. And the Particle Swarm
Optimization method is further used to optimize the parameters of Naive Bayes algorithm to
improve the accuracy and classification process. To perform the feature extraction, pre- processing
is done for the email. The Pre-processing have some methods such as tokenization, stemming and
stop word removal. After that we will apply the particle optimization [Link] on this
feature of optimization method, the tokens of the mail is classified as spam or non- spam. They
evaluated the performance of the system in terms of precision, recall and accuracy of the
classification[6]. Their parameters are calculated with help of true positive, true negative, false
positive and false negative. It has been found that the integrated approach of Naive Bayes and
Particle Swarm Optimization overcomes the failure of the Naive Bayes approach. We can also use
swarm optimization concepts like ant colony optimization, artificial bee colony optimization and
firefly algorithm. Further, to improve the performance instead of Naive Bayes, we can use any other
machine learning algorithm.
In this paper, Luo GuangJun et al. (2011) checked and reviewed the very popular machine
learning methods for their capability of classifying the spam mails. Here the methods used are
Bayesian classification, K-nearest neighbour classifier method, artificial neural network classifier
method, Support vector machine classifier method, Artificial immune system classifier method, and
rough sets classifier method. The Naive Bayes classifier method is based on the probability of an
event occurring in the future which can be detected by the previous occurring of the same [Link]
based on that probability it will classify the mail as spam or ham mail. Here the probability of
5
the word plays the major role in classification. The k-nearest neighbour classifier method is based
on the example. It will check the previous documents for classification. And finding the nearest
neighbour is done by using the traditional indexing methods. The Artificial Neural Network is
also known as Neural Network. It is based on a biological neural network and consists of a collection
of artificial neurons. At the time of the learning phase, it changes its structure based on the
information that flows through the artificial network. It has the stages of training and filtering stage.
The Support Vector Machine classifier method is based on the concept of decision planes which
define the boundaries of the decision. This algorithm finds the optimal hyperplane with maximum
margin for separating the two classes which is mainly required for solving the optimization
problems. In the Artificial Immune System classifier method the overall response involves three
evolutionary methods namely gene library, negative selection and global selection. This will
organize the fittest antibodies by interacting with current antigens. The rough set classifier method
has an ability to reduce the information systems. summarize these six methods, the Naive Bayes is
the most accurate and also in terms of spam precision this method gave the highest precision among
the six methods. The neural network has the simplest and fastest algorithms, while the rough set
method is most complicated and it has to be hybrid with genetic algorithms to get the deserved
results. The Artificial Immune System method gave a satisfying result which is to be expected for
better performance but it gave the poor performance. It will provide the good performance when it
is hybridized with rough set method.
Email Spam filtering process works through a set of protocols to determine either the message
is spam or not. At present, a large number of spam filtering process have existed. Among them,
Standard spam filtering process follows some rules and acts as a classifier with sets of protocols.
Figure.1 shows that, a standard spam filtering process performed the analysis by following some
steps.
6
First one is content filters which determine the spam message by applying several Machines
learning techniques . Second, header filters act by extracting information from email header. Then,
backlist filters determine the spam message and stop all emails which come from backlist file.
Afterward, “Rules-based filters” recognize sender through subject line by using user defined criteria.
Next, “Permission filters” send the message by getting recipients pre-approvement. Finally,
“Challengeresponse filter” performed by applying an algorithm for getting the permission from the
sender to send the mail.
A client can send or receive an email by just one clicking through an ISP. Client level spam
filtering provides some frameworks for the individual client to secure mail transmission. A client
can easily filter spam through these several existing frameworks by installing on PC. This
framework can interact with MUA (Mail user agent) and filtering the client inbox by composing,
accepting and managing the messages . Enterprise level spam filtering is a process where provided
frameworks are installing on mail server which interacts with the MTA for classifying the received
messages or mail in order to categorize the spam message on the network. By this system, a user on
that network can filter the spam by installing appropriate system more efficiently. By far most;
current spam filtering frameworks use principle based scoring procedures. An arrangement of
guidelines is connected to a message and calculate a score based principles that are valid for the
message. The message will consider as spam message when it exceeds the threshold value. As
spammers are using various strategies, so all functions are redesigned routinely by applying a
list-based technique to automatically block the message.
7
Figure 2.2 represents the method of client side and enterprise level spam filtering
2.7 Summary
Since last few decades, researchers are trying to make email as a secure medium. Spam
filtering is one of the core features to secure email platform. Regarding this several types of research
have been progressed reportedly but still there are some untapped potentials. Over time, still now
e-mail spam classification is one of the major areas of research to bridge the gaps. Therefore, a large
number of researches already have been performed on email spam classification using several
techniques to make email more efficient to the users. That’s why, this paper tried to arrange the
summarized version of various existing Machine Learning approaches. In addition, in order to
evaluates the most of the approaches like Random Forest, Naive Bayes , SVM , kNN , andRandom
Forest used reliable and well known dataset for benchmarking performance such as SpamData , The
Spam Assassin, The Spambase, Ecml-pkdd challenge dataset , corpora dataset , Enron dataset ,Trec
dataset . Some of these dataset are in a prepared structure e.g. ECML and data accessible in
Spambase UCI archive.
8
CHAPTER 3
METHODOLOGY
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use
bag-of-words features to identify spam e-mail, an approach commonly used in text classification.
Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes
other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a
probability that an email is or is not spam.
Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to
the email needs of individual users and give low false positive spam detection rates that are
generally acceptable to users.
Let’s start with our spam detection data. We’ll be using the open-source Spambase dataset from
the UCI machine learning repository, a dataset that contains 5569 emails, of which 745 are spam.
The target variable for this dataset is ‘spam’ in which a spam email is mapped to 1 and anything
else is mapped to 0. The target variable can be thought of as what you are trying to predict.
9
In machine learning problems, the value of this variable will be modeled and predicted by other
variables.
To get to our solution we need to understand the four processing concepts below. Please note
that the concepts discussed here can also be applied to other text classification problems.
Data cleaning or cleansing is the process of detecting and correcting (or removing) corrupt or
inaccurate records from a record set, table, or database and refers to identifying incomplete,
incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the
dirty or coarsedata.
This phase involves the deletion of words or characters that do not add value tothe meaning of the
text. Some of the standard cleaning steps are listed below:
• Lowering case
• Removal of stopwords
• Removal of hyperlinks
10
3.3.1 Lowering Case:
[Link] The words, ‘TEXT’, ‘Text’, ‘text’ all add the same value to a sentence.
[Link] Lowering the case of all the words is very helpful for
reducing the dimensions by decreasing the size of the vocabulary.
Next, we remove any URLs in the data. There is a good chance that email will have some
URLs in it. We don’t need them for our further analysis as they do not add any value to the results.
EDA is a phenomenon under data analysis used for gaining a better understanding of data
aspects likevmain features of data,variables and relationships that hold between them identifying
which variables are important for our problem. We shall look at various exploratory data analysis
methods like Descriptive Statistics, which is a way of giving a brief overview of thedataset we are
dealing with, including some measures and features of the sample grouping data [Basic grouping
with group by] ANOVA, Analysis Of Variance, which is a computational method to divide
variations in an observations set into different components,Correlation and correlation methods.
11
Figure 3.2: EDA
Data preprocessing is an next step in building a Machine Learning model and depending on how
well the data has been preprocessed; the results are seen. In NLP, text preprocessing is the first step
in the process of building a model.
The model building process involves setting up ways of collecting data, understanding and paying
attention to what is important in the data to answer the questions you are asking, finding a statistical,
mathematical or a simulation model to gain understanding and make predictions.
12
CHAPTER 4
RESULTS
Dataset
13
CHAPTER 5
CONCLUSION
Spam email is one of the most demanding and troublesome internet issues in today’s world of
communication and technology. Spammers by generating spam mails are misusing this
communication facility and thus affecting organization’s and many email users. In this paper, a
Spam Mail Detection system is introduced which makes use of a NLP approach for its
implementation. The classification algorithms used in this approach are Naïve Bayes. The accuracy
achieved by Naïve Bayes algorithm is 90% respectively. system shows that the experimental results
are better when performed on only naive bayes algorithm. In order to enhance the system’s
performance and results, the concept of boosting approach could be considered for future work. The
boosting technique will replace the weak classifier’s learning features with the strong classifier’s
features and thus enhancing the overall system’s performance.
14
REFERENCES
[1] A. J. Saleh, A. Karim, B. Shanmugam et al., “An intelligent spam detection model based on
artificial immune system,” Information, vol. 10, no. 6, p. 209, [Link] at: Publisher Site | Google
Scholar
[2] A. Sharaff, “Comparative study of classification algorithms for spam email detection,” in
Emerging Research in Computing, Information, Communicationand Applications, pp. 237–244,
Springer, Berlin, Germany, [Link] at: Publisher Site
[3] B. Yu and Z.-B. Xu, “A comparative study for content-based dynamic spam classification using
four machine learning algorithms,” Knowledge-Based Systems, vol. 21, no. 4, pp. 355–362,
[Link] at: Publisher Site | Google Scholar
[4] D. Ruano-Ordás, F. Fdez-Riverola, and J. R. Méndez, “Using evolutionary computation for
discovering spam patterns from e-mail samples,” Information Processing & Management, vol. 54,
no. 2, pp. 303–317, [Link] at: Publisher Site | Google Scholar
[9] S. K. Trivedi and S. Dey, “Interplay between probabilistic classifiers and boosting algorithms
for detecting complex unsolicited emails,” Journal of Advances in Computer Networks, vol. 1, pp.
132–136, [Link] at: Publisher Site | Google Scholar
15
[10] S. Smadi, N. Aslam, and L. Zhang, “Detection of online phishing email using dynamic
evolving neural network based on reinforcement learning,” Decision Support Systems, vol. 107, pp.
88–102, [Link] at: Publisher Site | Google Scholar
[11] S. Y. Bhat, “Spammer classification using ensemble methods over structural social network
features,” in Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web
Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 454–458, Warsaw, Poland,
August [Link] at: Publisher Site | Google Scholar
Website reference:
3. spam dataset[Link]
5. [Link] classifier/
16
APPENDIX A: SAMPLE CODE
import streamlit as st
import pickle
import string
import nltk
ps = PorterStemmer()
def transform_text(text):
text = [Link]()
text = nltk.word_tokenize(text)
y = []
for i in text:
if [Link]():
[Link](i)
text = y[:]
[Link]()
for i in text:
[Link](i)
text = y[:]
[Link]()
for i in text:
[Link]([Link](i))
A.1
tfidf = [Link](open('[Link]','rb'))
model = [Link](open('[Link]','rb'))
if [Link]('Predict'):
# 1. preprocess
transformed_sms = transform_text(input_sms)
# 2. vectorize
vector_input = [Link]([transformed_sms])
# 3. predict
result = [Link](vector_input)[0]
# 4. Display
if result == 1:
[Link]("Spam")
else:
[Link]("Not Spam")
import numpy as np
[Link]()
[Link](5)
[Link](columns={'v1':'target','v2':'text'},inplace=True) [Link](5)
encoder = LabelEncoder()
df['target'] = encoder.fit_transform(df['target'])
A.2
[Link]()
# missing values
[Link]().sum()
[Link]().sum()
# remove duplicates
df = df.drop_duplicates(keep='first')
[Link]().sum()
[Link]
[Link]()
df['target'].value_counts()
[Link](df['target'].value_counts(), labels=['ham','spam'],autopct="%0.2f")
[Link]()
import nltk
[Link]('punkt')
df['num_characters'] = df['text'].apply(len)
[Link]()
# num of words
df['num_sentences']= df['text'].apply(lambdax:len(nltk.sent_tokenize(x)))
df[['num_characters','num_words','num_sentences']].describe()
df[df['target']==0][['num_characters','num_words','num_sentences']].describe()
df[df['target']==1][['num_characters','num_words','num_sentences']].describe()
[Link](figsize=(12,6))
[Link](df[df['target'] == 0]['num_characters'])
A.3
[Link](df[df['target'] == 1]['num_characters'],color='red')
[Link](figsize=(12,6))
[Link](df[df['target'] == 0]['num_words'])
[Link](df[df['target'] == 1]['num_words'],color='red')
[Link](df,hue='target')
[Link]([Link](),annot=True)
def transform_text(text):
text = [Link]()
text = nltk.word_tokenize(text)
y = []
for i in text:
if [Link]():
[Link](i)
text = y[:]
[Link]()
for i in text:
text = y[:]
[Link]()
for i in text:
[Link]([Link](i))
transform_text("I'm gonna be home soon and i don't want to talk about this stuff
anymore tonight, k? I've cried enough today.")
A.4
ps = PorterStemmer()
[Link]('loving')
df['transformed_text'] = df['text'].apply(transform_text)
[Link]([Link](Counter(spam_corpus).most_common(30))[0],[Link]
rame(Counter(spam_corpus).most_common(30))[1])
[Link](rotation='vertical')
[Link]()
ham_corpus = []
ham_corpus.append(word)
[Link]([Link](Counter(ham_corpus).most_common(30))[0],[Link]
ame(Counter(ham_corpus).most_common(30))[1])
[Link](rotation='vertical')
[Link]()
cv = CountVectorizer()
tfidf = TfidfVectorizer(max_features=3000)
gnb = GaussianNB()
mnb = MultinomialNB()
bnb = BernoulliNB()
[Link](X_train,y_train)
y_pred1 = [Link](X_test)
A.5
print(accuracy_score(y_test,y_pred1))
print(confusion_matrix(y_test,y_pred1))
print(precision_score(y_test,y_pred1))
knc = KNeighborsClassifier()
mnb = MultinomialNB()
dtc = DecisionTreeClassifier(max_depth=5)
bc = BaggingClassifier(n_estimators=50, random_state=2)
etc=ExtraTreesClassifier(n_estimators=50,random_state=2)
gbdt = GradientBoostingClassifier(n_estimators=50,random_state=2)
xgb = XGBClassifier(n_estimators=50,random_state=2)
clfs = {
A.6
'SVC' : svc,
'KN' : knc,
'NB': mnb,
'DT': dtc,
'LR': lrc,
'RF': rfc,
'AdaBoost': abc,
'BgC': bc,
'ETC': etc,
'GBDT':gbdt,
'xgb':xgb
def train_classifier(clf,X_train,y_train,X_test,y_test):
[Link](X_train,y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test,y_pred)
precision = precision_score(y_test,y_pred)
return accuracy,precision
train_classifier(svc,X_train,y_train,X_test,y_test)
current_accuracy,current_precision=train_classifier(clf,X_train,y_train,X_test,y_t
est)
print("For ",name)
print("Accuracy - ",current_accuracy)
print("Precision - ",current_precision)
accuracy_scores.append(current_accuracy)
A.7
precision_scores.append(current_precision)
temp_df =
[Link]({'Algorithm':[Link](),'Accuracy_max_ft_3000':accuracy_scores,'
Precision_max_ft_3000':precision_scores}).sort_values('Precision_max_ft_3000',a
scending=False)
temp_df =
[Link]({'Algorithm':[Link](),'Accuracy_scaling':accuracy_scores,'Precisi
on_scaling':precision_scores}).sort_values('Precision_scaling',ascending=False)
new_df = performance_df.merge(temp_df,on='Algorithm')
new_df_scaled = new_df.merge(temp_df,on='Algorithm')
temp_df =
[Link]({'Algorithm':[Link](),'Accuracy_num_chars':accuracy_scores,'Pr
ecision_num_chars':precision_scores}).sort_values('Precision_num_chars',ascendi
ng=False)
new_df_scaled.merge(temp_df,on='Algorithm')
# Voting Classifier
mnb = MultinomialNB()
[Link](X_train,y_train)
ExtraTreesClassifier(n_estimators=50, random_state=2))],voting='soft')
y_pred = [Link](X_test)
print("Accuracy",accuracy_score(y_test,y_pred))
print("Precision",precision_score(y_test,y_pred))
A.8
APPENDIX B: SCREENSHOTS
B.1
Figure B.3:Sample Output Not spam
B.2
ONLINE CERTIFICATION
27
28