Spring 2025 - CS619 - 10969

The project focuses on developing a machine learning model to detect cyber abuse in Roman Urdu text on social media platforms. It involves data collection, preparation, pre-processing, feature extraction, and the application of various machine learning techniques, culminating in a web interface for real-time testing. The project aims to evaluate model performance through metrics like accuracy, precision, recall, and F1-score, ultimately determining the most effective machine learning technique for this task.

Uploaded by

z GOD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Spring 2025 - CS619 - 10969

Uploaded by

z GOD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Cyber Abuse Detection using Machine Learning for Roman Urdu

Project Domain / Category

Data Science / Machine Learning / Natural Language Processing (NLP)

Abstract / Introduction
The extensive use of social media has led to a significant increase in cyber abuse, including
harassment, bullying, and offensive language, particularly in Roman Urdu. The absence of effective
automated detection systems allows such content to persist, negatively impacting online interactions.
Identifying cyber abuse in Roman Urdu presents a unique challenge due to informal language
structure, variations in spelling, and contextual meanings.
This project aims to develop a machine learning-based model capable of detecting and classifying
cyber abuse in Roman Urdu text. The proposed system will utilize natural language processing (NLP)
techniques and will be trained on data collected from social media platforms. Furthermore, a web
interface will be developed to enable users to evaluate the model’s performance in real time.

Functional Requirements:
Admin (Student) will perform all these (Functional Requirements) tasks.
1. Data-Collection
 For this project, the student will collect data from any social media platform (such as
YouTube, Facebook, Twitter, or Instagram) to detect cyber abuse. The dataset must contain
at least 5,000 comments focusing on Roman Urdu.
 The student is required to create their own dataset, and using pre-existing datasets from
sources like Kaggle or other online repositories will not be accepted. Any attempt to do so
will result in a deduction of marks. A sample dataset is provided in the link below for
reference.
2. Data Preparation
 Prepare the dataset by labeling each comment as "Abusive (A)" or "Non-Abusive (NA)."
This step involves manually reviewing the data to assign appropriate labels, ensuring the
dataset is clean, well-structured, and suitable for machine learning.
3. Data Pre-Processing
 As real-world data is often incomplete, noisy, and contains missing values, the student
must apply pre-processing techniques to ensure data quality. The following steps should be
performed systematically:

i. Missing Values
o First, check how many missing values are present and display the output.
o Then, apply an appropriate technique to handle them (e.g., remove or fill with
relevant values).

ii. Duplicate Values

o First, check the number of duplicate entries and display the output.
o Then, remove the duplicates to maintain data quality.

iii. Noise & Outliers

o First, identify noisy or extreme values and display the output.
o Then, clean or handle them to improve dataset reliability.

 Additionally, the student must normalize the dataset, remove stop words, and ensure data
is properly structured before feature extraction.
4. Feature Extraction
 After the pre-processing step, the student will apply feature extraction techniques to
convert textual data into a structured format suitable for machine learning models. Possible
techniques include Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words
(BoW), N-Gram Models (Uni-Gram, Bi-Gram, Tri-Gram, etc.), Word Embeddings
(Word2Vec, FastText, GloVe) can also be applied.
 The student must have a clear understanding of the working principles, advantages, and
limitations of the chosen feature extraction method. It is essential to justify the selection by
explaining why a particular technique was used and how it contributes to improving the
model's performance.
5. Train & Test Data
 The student will split the dataset into 75% training data and 25% testing data to evaluate
the performance of the machine learning models. To ensure reliable results, the student
can apply randomized splitting to avoid bias and maintain data diversity.
6. Machine learning Techniques
 The student must use at least three different classifiers/models from distinct machine
learning techniques/algorithms. Possible choices include Naïve Bayes (Multinomial,
Bernoulli), Support Vector Machine (SVM) with different kernels (Poly, RBF), Decision Tree,
Random Forest, Logistic Regression, and Ensemble Methods. The selection should be based
on the suitability of the algorithm for text classification tasks.
 Additionally, the student must have a clear understanding of each chosen model, including
its algorithmic working, advantages, limitations, and practical applications. It is essential
that the student can justify their selection by explaining why a particular model was chosen
over others. Furthermore, the student should be proficient in the implementation and
coding of the selected models and be able to analyse their performance effectively.
7. Confusion Matrix
 The student must generate a confusion matrix for each classification model to evaluate its
performance. The confusion matrix should include key metrics such as True Positives (TP),
True Negatives (TN), False Positives (FP), and False Negatives (FN) to assess the model’s
accuracy. A separate confusion matrix must be created for each selected machine learning
model, and the results should be analyzed to compare their effectiveness in detecting cyber
abuse.
8. Accuracy Evaluation
 The student must find the accuracy of all selected machine learning techniques and
compare their performance.
 This project will also determine which machine learning technique is more effective for
detecting cyber abuse.
 In addition to accuracy, the student should evaluate precision, recall, and F1-score for a
more comprehensive analysis.
 The student must visually represent accuracy comparisons using graphs, bar charts, or
other suitable visualizations to highlight differences between models.
 A final analysis should be conducted to explain which model performed best and why,
based on the evaluation metrics.
9. Web Interface Integration
 After developing the model, the student will integrate a web interface to allow users to test
the model’s performance using real-time comments.
 The interface should provide a text input field where users can enter a comment, and the
system will classify it as Abusive (A) or Non-Abusive (NA).
 The web interface will be developed using Flask or Django, with a simple HTML/CSS
frontend for user interaction.
 The student should ensure that the interface is fully functional, correctly linked to the
trained model, and capable of making real-time predictions.

Tools/Techniques:
 Anaconda: Python distribution platform for development.
 Jupiter Notebook: For implementing machine learning models.
 Python: Programming language used for data pre-processing, model training, and feature
extraction.
 Machine Learning Algorithms: For training and testing hate speech detection.
 Web Interface: Basic HTML/CSS, Flask, or Django.

Prerequisite:
 Knowledge of Artificial Intelligence, Machine Learning, and Natural Language Processing
concepts is required. Students will cover a short course relevant to these concepts, alongside
SRS and Design initial documentation or see the links below.
Helping Material:
Python:
https://www.python.org/
https://www.w3schools.com/python/
https://www.tutorialspoint.com/python/index.htm

Feature Extraction Method:

https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be
https://www.analyticsvidhya.com/blog/2021/04/guide-for-feature-extraction-techniques/
https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-real-world-
dataset-796d339a4089
https://www.analyticsvidhya.com/blog/2021/07/feature-extraction-and-embeddings-in-nlp-a-beginners-
guide-to-understand-natural-language-processing/
http://uc-r.github.io/creating-text-features
Machine Learning Techniques:
https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0
https://towardsdatascience.com/top-10-algorithms-for-machine-learning-beginners-149374935f3c
https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-should-know-
3cc96e0eeee9
https://towardsdatascience.com/machine-learning-classifiers-a5cc4e1b0623
https://www.youtube.com/watch?v=fG4e4TUrJ3E
https://www.youtube.com/watch?v=7eh4d6sabA0

Dataset:
https://drive.google.com/file/d/1l8Mo22kVQzrucbo2LCwnP74sRZ4Eztb_/view?usp=sharing

Supervisor:
Name: Tayyab Waqar
Email ID: [email protected]
Skype ID: maliktayyab786_1

Internship Report
No ratings yet
Internship Report
20 pages
SQL Vs PySpark 1678871778
No ratings yet
SQL Vs PySpark 1678871778
8 pages
Smanimarannmphase 1
No ratings yet
Smanimarannmphase 1
3 pages
All Projects F 21
No ratings yet
All Projects F 21
141 pages
All Projects F21
No ratings yet
All Projects F21
138 pages
Machine Learning Assignment 2: Assessment Type
No ratings yet
Machine Learning Assignment 2: Assessment Type
11 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
RigmaUmesh Finalprojectreport
No ratings yet
RigmaUmesh Finalprojectreport
60 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Data Mining & Machine Learning Courseoutline
No ratings yet
Data Mining & Machine Learning Courseoutline
7 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
CSC 603 - Final Project
No ratings yet
CSC 603 - Final Project
3 pages
Final Project
No ratings yet
Final Project
4 pages
COM7039M MachineLearning Assignment Brief-Level 7-1
No ratings yet
COM7039M MachineLearning Assignment Brief-Level 7-1
12 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Vijayi WFH Tech - Assignment - AI Internship - Jan 2025
No ratings yet
Vijayi WFH Tech - Assignment - AI Internship - Jan 2025
3 pages
All Projects Spring 22
No ratings yet
All Projects Spring 22
202 pages
Project Report Hate
100% (1)
Project Report Hate
24 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
ML Project Guidelines SWE Winter 2024
No ratings yet
ML Project Guidelines SWE Winter 2024
8 pages
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
No ratings yet
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
54 pages
H9AIMLC - AI - ML in Cybersecurity - Provisional Grading Details and Evaluation Rubric - Feb 2025
No ratings yet
H9AIMLC - AI - ML in Cybersecurity - Provisional Grading Details and Evaluation Rubric - Feb 2025
6 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
PBL-2 Report File
No ratings yet
PBL-2 Report File
11 pages
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
From Everand
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Margaux Masson-Forsythe
No ratings yet
Data Mining Project
No ratings yet
Data Mining Project
4 pages
INF385T IMLsyllabus
No ratings yet
INF385T IMLsyllabus
4 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
ML Case Study
No ratings yet
ML Case Study
1 page
NM TF
No ratings yet
NM TF
3 pages
81 Cse e
No ratings yet
81 Cse e
5 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
CM2060 NLP Coursework
No ratings yet
CM2060 NLP Coursework
5 pages
Python Que
No ratings yet
Python Que
3 pages
Assignment 1 Individual Assignment
No ratings yet
Assignment 1 Individual Assignment
5 pages
Miniproject 1: Machine Learning 101: Preamble
No ratings yet
Miniproject 1: Machine Learning 101: Preamble
5 pages
Python Task Descriptions
No ratings yet
Python Task Descriptions
10 pages
Ca One
No ratings yet
Ca One
3 pages
Cyberbullying Basic Layout
No ratings yet
Cyberbullying Basic Layout
7 pages
Report Final
No ratings yet
Report Final
31 pages
Rahoof
No ratings yet
Rahoof
14 pages
Batch 17
No ratings yet
Batch 17
27 pages
MINOR PROJECT (Updated)
No ratings yet
MINOR PROJECT (Updated)
60 pages
UCCD2063 Artificial Intelligence Techniques Practical Assignment
No ratings yet
UCCD2063 Artificial Intelligence Techniques Practical Assignment
3 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Term Project
No ratings yet
Term Project
2 pages
CIS2205-24-25-Assignment 2
No ratings yet
CIS2205-24-25-Assignment 2
10 pages
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
11 pages
Project Seminar
No ratings yet
Project Seminar
12 pages
Information Retreival Assignment
No ratings yet
Information Retreival Assignment
4 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Shi008 Booklet 80hours Machine Learning Using Python 082022
No ratings yet
Shi008 Booklet 80hours Machine Learning Using Python 082022
9 pages
TMLS20 Machine Learning Coursework-1
No ratings yet
TMLS20 Machine Learning Coursework-1
5 pages
CW Sequence Analysis
No ratings yet
CW Sequence Analysis
9 pages
New ITRAdd On
No ratings yet
New ITRAdd On
6 pages
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Survey On Crime Analysis and Prediction Using Machine Learning Techniques
No ratings yet
Survey On Crime Analysis and Prediction Using Machine Learning Techniques
11 pages
Spring 2025 - CS619 - 10928
No ratings yet
Spring 2025 - CS619 - 10928
2 pages
Spring 2025 - CS619 - 10930
No ratings yet
Spring 2025 - CS619 - 10930
2 pages
Japanese Language Calendar
No ratings yet
Japanese Language Calendar
4 pages
2025 QS World University Rankings 2.1 (For Qs - Com)
No ratings yet
2025 QS World University Rankings 2.1 (For Qs - Com)
37 pages
National Institute of Technology Uttarakhand: Language Processor
No ratings yet
National Institute of Technology Uttarakhand: Language Processor
8 pages
IICPC Camp Bluebook
No ratings yet
IICPC Camp Bluebook
14 pages
Openscad Manual 3
No ratings yet
Openscad Manual 3
13 pages
Jim Hoffman Revised
No ratings yet
Jim Hoffman Revised
7 pages
ODPS Intro v03
No ratings yet
ODPS Intro v03
2 pages
Vlsi Design Cia2
No ratings yet
Vlsi Design Cia2
2 pages
Goldman-Bleach Plant Control Optimization
No ratings yet
Goldman-Bleach Plant Control Optimization
13 pages
Snap Logic
No ratings yet
Snap Logic
15 pages
Amit-Java Dev
No ratings yet
Amit-Java Dev
5 pages
9 Skills Competency Matrix
No ratings yet
9 Skills Competency Matrix
30 pages
Sv9500 Data Sheet
No ratings yet
Sv9500 Data Sheet
2 pages
Oracle BI Cheat Sheet 11 Feb 2014 Download
No ratings yet
Oracle BI Cheat Sheet 11 Feb 2014 Download
4 pages
MLX90316 Datasheet Melexis PDF
No ratings yet
MLX90316 Datasheet Melexis PDF
48 pages
Client Server Software Engineering
0% (1)
Client Server Software Engineering
9 pages
Project Management Process Group & Knowledge Area Mapping: (Adapted From PMBOK™ Guide 6th Edition)
No ratings yet
Project Management Process Group & Knowledge Area Mapping: (Adapted From PMBOK™ Guide 6th Edition)
1 page
BTS3900A&DBS3900 Hardware Description For Enhanced Cabinets (01) (PDF) - en
No ratings yet
BTS3900A&DBS3900 Hardware Description For Enhanced Cabinets (01) (PDF) - en
223 pages
Awesome Windows Commands You Might Not Know
100% (1)
Awesome Windows Commands You Might Not Know
27 pages
Seckin Et Al 2019 Production Fault Simulation and Forecasting From Time Series Data With Machine Learning in Glove
No ratings yet
Seckin Et Al 2019 Production Fault Simulation and Forecasting From Time Series Data With Machine Learning in Glove
12 pages
Cube Games - Linear Growing Patterns Representation Match: Who's With Whom? Part 1
No ratings yet
Cube Games - Linear Growing Patterns Representation Match: Who's With Whom? Part 1
15 pages
Week 5 Tutorial-Anwsers
100% (1)
Week 5 Tutorial-Anwsers
6 pages
TOC Insem Sep - 2023
No ratings yet
TOC Insem Sep - 2023
3 pages
CRM Demo
No ratings yet
CRM Demo
29 pages
Assignment 1
No ratings yet
Assignment 1
31 pages
Etap - Key Points For Load Summary, Part 4: Lumped Load Applications
No ratings yet
Etap - Key Points For Load Summary, Part 4: Lumped Load Applications
2 pages
New Alarms List
No ratings yet
New Alarms List
14 pages
Fortran Notes
No ratings yet
Fortran Notes
38 pages
Community College of Rhode Island One (1) Week Network + Training Schedule & Syllabus
No ratings yet
Community College of Rhode Island One (1) Week Network + Training Schedule & Syllabus
9 pages
Bill Book Systerm
No ratings yet
Bill Book Systerm
10 pages
Senior Data Entry Operator, Key Punch Operator (BPS-12)
No ratings yet
Senior Data Entry Operator, Key Punch Operator (BPS-12)
3 pages