implemention of sms spam filtering
implemention of sms spam filtering
Chapter 5
Implementation:
CODE EXPLANATION:
IMPORT LIBRARY:
CODE :
accuracy_score
1. import pandas as pd
Purpose: Imports the Pandas library, which is used for data manipulation and analysis.
Usage: To load and handle datasets in a tabular format (e.g., CSV, Excel).
Usage: Converts text data into a bag-of-words representation (numerical vectors). It tokenizes the
Usage: A probabilistic learning algorithm suited for text classification tasks (e.g., spam detection).
• Usage: Measures the model's accuracy by comparing predicted labels with actual labels.
Workflow Overview:
o The text data (e.g., SMS messages) and their labels (e.g., spam or ham) are loaded into a
Pandas DataFrame.
2. Text Vectorization:
3. Dataset Splitting:
o Using train_test_split to divide the dataset into training and testing sets.
4. Model Initialization:
5. Model Training:
6. Prediction:
7. Model Evaluation:
Accuracy_score?
Interpretation: A high accuracy score means the model is predicting labels correctly for most cases.
A low accuracy score indicates the model needs improvement (e.g., more data, better preprocessing, or a
different algorithm).
Dataset: A Pandas DataFrame object, which is a two-dimensional tabular structure with rows and
columns. The DataFrame allows easy manipulation, analysis, and visualization of data.
CSV file: A text file where each line represents a row, and each column value is separated by a comma.
Commonly used to store tabular data such as datasets for analysis.
dataset.head() Do:
urpose: It shows the first 5 rows of the DataFrame by default. This is helpful for: Verifying that the dataset
is loaded correctly. Understanding the structure of the data.
Output:
text spam
model=MultinomialNB() model.fit(X_train,y_train)
Code Explanation
1. vectorizer = CountVectorizer()
o Each text is tokenized (split into individual words). o A sparse matrix is created
where each row represents a document, and each column represents a unique word in
the dataset.
o The values in the matrix indicate the frequency of the word in the document.
X = vectorizer.fit_transform(dataset['text'])
fit_transform():
Result: X is a sparse matrix representing the numerical form of the text data.
test_size=0.2: Reserves 20% of the data for testing and 80% for training.
Outputs:
model = MultinomialNB()
This algorithm is suited for text classification problems, especially with discrete data (e.g., word
frequencies).
model.fit(X_train, y_train)
Inputs:
data.
Result: The model learns the relationship between word occurrences and labels (spam/ham). Step-5:
Output: yPred contains the predicted labels for the test set.
print(accuracy)
Text Preprocessing:
Data Splitting:
Model Training:
Prediction:
Evaluation:
def pridictMessage(message):
messageVector=vectorizer.transform([message])
prediction=model.predict(messageVector) return
'Spam' if prediction[0] == 1 else 'Ham
OUTPUT:
Enter Text to Predict Ham and Spam message: hii how are you the
messgage is: Ham
Enter Text to Predict Ham and Spam message: Congratulations! You've won a free vacation.
Explanation:
Purpose: Defines a function predict Message that takes a single input parameter, message (the text to
classify as spam or ham).
vectorizer.transform([message]): Converts the input message (text) into a numerical vector representation
using the pre-trained CountVectorizer.
This ensures the input format matches what the model was trained on. model.predict(messageVector):
Uses the trained Naive Bayes model to predict the label (1 for spam, 0 for ham) of the input vector.
model.predict(messageVector): Uses the trained Naive Bayes model to predict the label (1 for spam, 0 for
ham) of the input vector. model.predict(messageVector): Uses the trained Naive Bayes model to predict
Logic: If the model's prediction (prediction[0]) is 1, the message is classified as "Spam". Otherwise,
it is classified as "Ham" (not spam).
Prompts the user to enter a text message for classification.
Processing:
Apps.py file
Code :
DetectorConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'detector'
Code Explanation
1. AppConfig Class: AppConfig is a base class provided by Django for configuring an application.
Each app in a Django project can have its own configuration class, which is registered in the project's
INSTALLED_APPS list in the settings.py file.
2. Attributes of DetectorConfig
b. name: name = 'detector': This specifies the name of the application.The name must match the app’s
folder name, making it easy for Django to locate and recognize the app.
Perform app-specific initialization tasks (e.g., importing signals or configuring settings) by overriding the
ready() method of AppConfig.
INSTALLED_APPS = [
...
Django uses the DetectorConfig class to initialize and configure the detector app during the project setup.
Forms.py:
MessageForm(forms.Form):
CODE EXPLANATION:
A Django form class named messageform, which is used to handle user input in a structured way. Here's a
detailed explanation:
1. forms.Form: The forms.Form class is part of Django's forms framework. It is used to define and
manage HTML forms in a clean, Pythonic way.Forms handle input validation and rendering, making it easy
to work with user input.
2. The MessageForm Class: MessageForm inherits from forms.Form, meaning it is a custom form
that can include various fields, validation rules, and attributes.
3. The text Field forms.CharField: Defines a single-line text field. In this case, the text field is
intended to accept the user's message.
Customizes the default input widget for CharField to a multi-line text area (<textarea> element in
HTML).
attrs Parameter: The attrs dictionary allows you to pass HTML attributes to the rendered widget. These
enhance the user experience and appearance: class: 'form-control': Adds the CSS class form-control
to the <textarea>.This is often used with frameworks like Bootstrap to style form elements.
placeholder: 'Enter your message here...': Adds a placeholder text to guide the user on what to input.
1. Renders an HTML form with a multi-line text area for the text input.
3. Validates the user input automatically based on the field type (e.g., ensuring it is a string and not
empty by default).
4. Provides Data to the backend when the form is submitted, making it easy to access the text value in
your views.
Views.py file
The provided Django code is a backend implementation of a simple SMS Spam Detector using machine
learning. Below is a step-by-step breakdown of the code:
Imports
1. Django Imports : render: Used to render templates (HTML files) and pass context variables to
them.
2. Pandas (pd) : Used to handle the dataset (emails.csv) for preprocessing and analysis.
3. Scikit-learn Modules : CountVectorizer: Converts text data into numerical feature vectors using a
bag-of-words model.train_test_split: Splits the dataset into training and testing sets.MultinomialNB:
A Naive Bayes algorithm suitable for text classification. accuracy_score: Evaluates the accuracy of
the trained model.
4. Custom Imports .forms.MessageForm: A Django form to accept user input for message
classification.
assumed to contain: text: The SMS text. spam: A binary column indicating whether the message is spam
X = vectorizer.fit_transform(dataset['text'])
• CountVectorizer transforms the text data into numerical vectors based on word frequency.
Dataset Splitting
model.fit(X_train, y_train)
• A Naive Bayes model is initialized and trained on the training dataset. Message Prediction
• Process: Converts the input message into a numerical vector using vectorizer.transform.
Predicts whether the message is spam (1) or ham (0) using the trained model.
request.method == 'POST':
form = MessageForm(request.POST)
if form.is_valid():
message = form.cleaned_data['text']
result = predictMessage(message)
else:
form = MessageForm()
1. Method Handling:
2. Template Rendering:
• The trained Naive Bayes model predicts whether a user-submitted message is spam or ham.
• The Django view (Home) handles both displaying the form and showing prediction results.
Urls.py
• """
• URL configuration for spam_detection project.
•
• The `urlpatterns` list routes URLs to views. For more information please see:
• https://docs.djangoproject.com/en/5.0/topics/http/urls/
• Examples:
• Function views
• 1. Add an import: from my_app import views
• 2. Add a URL to urlpatterns: path('', views.home, name='home')
• Class-based views
• 1. Add an import: from other_app.views import Home
• 2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
• """
• urlpatterns = [
• path('admin/', admin.site.urls),
• path('', include('detector.urls'))
• ]
Code Explanation:
The provided code configures the URL routing for a Django project named spam_detection. It defines how
requests to various URLs are handled by linking them to specific views or apps. Here's a detailed breakdown:
File Purpose: This file (usually named urls.py in a Django project) acts as the central routing configuration
for the project. It maps URLs to their corresponding views or other URL configurations.
Imports: from django.contrib import admin and from django.urls import path, include
1. admin: Provides access to Django's built-in admin interface for managing the project's data and
models.
3. include: Used to include other URL configurations from different apps or modules.
urlpatterns urlpatterns = [
path('admin/', admin.site.urls),
path('', include('detector.urls'))
This is a list of all the URL patterns (routes) for the project.
• When users navigate to /admin/, they are directed to the admin interface where they can manage
models and data.
• Example: http://127.0.0.1:8000/admin/
2. path('', include('detector.urls'))
• Maps the root URL (/) to the URL configuration of the detector app.
include('detector.urls'):
o This allows modular URL configuration, where each app has its own URL patterns.
Then the root URL (http://127.0.0.1:8000/) will route to the Home view.
It Works: Django processes incoming requests by matching the requested URL against the patterns in
urlpatterns.
If a match is found, it forwards the request to the appropriate view or app-specific URL configuration.
1. Modular Design:
o Apps can have their own urls.py files, keeping the project structure organized.
3. Scalability:
o As the project grows, you can add more apps and their URL configurations without cluttering
the main urls.py.
The provided code defines the URL configuration for a specific Django app. It maps URLs to
corresponding views within the app, enabling Django to serve appropriate responses based on the requested
URL.
File Purpose
This file (usually named urls.py in the app directory) contains the URL patterns for the app. It ensures that
requests to certain URLs are routed to specific functions or class-based views defined in the app's views.py
file.
views
2. views:
o Imports the views.py file from the current app (. represents the current directory).
o Views contain the logic that handles requests and returns responses.
views.Home, name='home')
• Maps the root URL ('') of the app to the Home view in views.py.
o If this app is included in the project at the root level (e.g., path('', include('detector.urls')) in
the project urls.py), this URL would match http://127.0.0.1:8000/.
2. views.Home:
o Refers to the Home function (or class-based view) in the app's views.py file.
o This function will be executed when a user accesses the root URL.
3. name='home': Provides a unique name for this URL pattern. Allows reverse URL resolution (e.g.,
using reverse('home') or {% url 'home' %} in templates to generate the URL dynamically).Helps
avoid hardcoding URLs in multiple places.
Works : When a request is made to the root URL of this app (e.g., /), Django matches the URL against the
urlpatterns list. If the URL matches '', Django calls the Home view from views.py.The Home view handles
the request and generates a response (e.g., rendering an HTML template).
Example of views.Home
Home(request):
• Allows the app to handle its specific routes independently while integrating into the overall project.
= [ path('',
views.Home,name='home')
The provided code defines the URL configuration for a specific Django app. It maps URLs to
corresponding views within the app, enabling Django to serve appropriate responses based on the
requested URL.
File Purpose
This file (usually named urls.py in the app directory) contains the URL patterns for the app. It
ensures that requests to certain URLs are routed to specific functions or class-based views defined
in the app's views.py file. Imports from django.urls import path from . import views
urlpatterns
urlpatterns = [ path('',
views.Home, name='home')
• Maps the root URL ('') of the app to the Home view in views.py.
Components of path():
1. '':
o Represents the root URL for the app.
Works: When a request is made to the root URL of this app (e.g., /), Django matches the URL against the
urlpatterns list. If the URL matches '', Django calls the Home view from views.py.The Home view handles
the request and generates a response (e.g., rendering an HTML template).
Home(request):
HTML CODE :
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Spam Detection</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap"
rel="stylesheet">
text-align: center;
h1 { font-size: 2em;
margin-bottom: 10px;
color: #6a11cb;
} p{ font-size:
1.1em; margin-
bottom: 20px;
flex; flex-direction:
column; align-items:
center;
.form-control {
15px; margin-bottom:
#ddd; border-radius:
background-color: #6a11cb;
transition: background-color
0.3s;
.btn:hover { background-
color: #2575fc;
top: 20px;
padding: 20px;
border-radius: 8px;
font-size: 1.2em;
font-weight: bold;
.result.spam { background-
#d8000c;
.result.ham { background-
#4f8a10;
h1 { font-size:
1.5em;
} p{
font-size: 1em;
.btn { padding:
size: 1em;
.result { font-
size: 1em;
</style>
</style>
<body>
<div class="container">
<form method="post">
{% csrf_token %}
{{form.as_p}}
</form>
{% if result %}
</div>
{% endif %}
</div>
</body>
</html>
CODE EXPLANATION
Html Code Explanation : This HTML template creates a visually appealing, responsive, and user-friendly
interface for a Spam Detection application. Here's the explanation of the code:
1. HTML Structure
HTML <head>
o The meta tags ensure proper character encoding and mobile responsiveness. o Google
o Font Awesome icons are included for visual enhancements (e.g., the shield and check
icons).
• Title:
HTML <body>
Form: The form is built to accept a message input from the user and process it.{{form.as_p}}: A Django
template tag to render form fields in <p> tags. {% csrf_token %}: Adds a CSRF token for security (required
in Django forms). The submit button uses a Font Awesome icon and CSS styling.
Result Display: If there is a result ({% if result %}), it dynamically displays a message styled differently
based on whether it's "Spam" or "Ham."
2. CSS Styles
Body Styling: Font and Background: Uses the "Roboto" font. The background is a gradient transitioning
from purple to blue for a modern feel.
Layout: The page is flex-centered to vertically and horizontally align the content within the viewport.
Card Design: White background with rounded corners and a shadow for a card-like effect. A max width of
500px ensures responsiveness.
Text: Headings and paragraph text are styled to enhance readability and highlight key information: Heading
(h1) ses a gradient color tone to reflect branding.Paragraph text is slightly larger and spaced.
Form : Input Fields (.form-control): Styled with padding, rounded corners, and light borders.
Button (.btn): Purple background with white text, changing to blue on hover for interactivity.
Ham Messages (.result.ham): Green background and text to show it's safe.
Responsive Design: Media queries adjust font sizes and button padding for smaller devices.
3. Dynamic Functionality: Django template tags integrate backend logic into the frontend: {% csrf_token
%}: Prevents CSRF attacks during form submission.{{form.as_p}}: Dynamically renders the form fields
defined in the Django backend.{{result|lower}}: Converts the result to lowercase for conditional
styling.{{result}}: Displays the classification result.
4. Features
Final Output