sqr da 2

The document presents a digital assignment on credit card fraud detection, focusing on the application of various quality assessment metrics and model performance evaluation. Key models analyzed include Logistic Regression, Support Vector Machine, Random Forest, and Extra Tree Classifier, with Random Forest achieving the highest accuracy and performance metrics. Statistical analyses, including T-tests and ANOVA, indicate that Extra Trees outperforms other models in precision and recall for fraud detection.

Uploaded by

Gokulraj M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

sqr da 2

Uploaded by

Gokulraj M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

School of Computer Science Engineering and Information Systems

MTech (Integrated) Software Engineering

Winter Semester 2024-2025

SWE3005 – SOFTWARE QUALITY AND RELIABILITY

TITLE: CREDIT CARD FRAUD DETECTION

DIGITAL ASSIGNMENT – 2

Submitted By

Gokulraj M - 21MIS0458

Slot: A1
TITLE: CREDIT CARD FRAUD DETECTION

1. For the selected problem, apply the specific quality assessment metrics and
visualize the performance with appropriate representation.

a. List of performance and error metrics.

• Accuracy = (TP + TN) / (TP + TN + FP + FN)

• Precision = TP / (TP + FP) (Useful for reducing false positives)
• Recall (Sensitivity) = TP / (TP + FN) (Useful for reducing false negatives)
• F1 Score = 2 × (Precision × Recall) / (Precision + Recall) (Balances Precision &
Recall)
• ROC-AUC (Receiver Operating Characteristic - Area Under Curve)
(Measures trade-off between TPR and FPR)
• Log Loss (Logarithmic Loss) (Measures how uncertain a classifier’s predictions
are)

Parameters Used in Each Model

1. Logistic Regression
• Default Parameters:
• Regularization parameter: C = 1.0
• Penalty: L2
• Solver: lbfgs
• Training & Prediction:
• Model is trained on x_train and y_train.
• Predictions are made on x_validation.
• Accuracy is calculated using y_validation.
2. Support Vector Machine (SVM)
• Kernel: Defines the similarity function to map data into a higher-dimensional
space for better separation. Common choices:
• "linear" – Linear separation
• "rbf" – For non-linear problems
• "poly" – Polynomial kernel
• Training & Prediction:
• Trained on x_train and y_train.
• Predictions are stored in svm_predictions.
• Validation performed using x_validation and y_validation.
3. Random Forest (RF) Classifier
• Key Hyperparameters:
• n_estimators: Number of decision trees in the forest. More trees improve
performance but increase training time.
• max_depth: Maximum depth of each tree. Deeper trees capture complex
relationships but risk overfitting.
• min_samples_split: Minimum samples required to split a node. Higher
values reduce overfitting.
• min_samples_leaf: Minimum samples required in a terminal leaf node to
avoid overly specific splits.
4. Extra Tree Classifier
• Key Hyperparameters:
• criterion: Defines the function for measuring the quality of a split.
Options:
▪ "gini" – Uses Gini impurity (default).
▪ "entropy" – Uses information gain.
• random_state: Controls the randomness of the estimator.
• min_samples_split: Minimum number of samples required to split an
internal node (prevents overfitting).
• bootstrap:
▪ True – Uses bootstrap sampling (sampling with replacement).
▪ False (default) – Uses the entire dataset for each tree.
Performance evaluation

Logistic Regression:
Support Vector Machine (SVM):
Random Forest (RF) Classifier:
Extra Tree Classifier:
Performance Charts (ROC Curve)

The Receiver Operating Characteristic (ROC) curve serves as a tool to evaluate and
compare the performance of various classification models we've chosen. To generate
this ROC curve, we've employed libraries such as scikit-learn and matplotlib.

SVM

Random forest classifier:

Extra Tree Classifier:

b. Perform Statistical analysis with appropriate metric that has high

impact on notifying quality of your product

The Random Forest Classifier emerges as the top-performing model, boasting the
highest accuracy (98.76%), recall (84.61%), F1 score (34.54%), precision (21.70%),
and ROC score (91.71%). Following closely behind is the Extra Trees (Ensemble)
model, with commendable performance metrics including accuracy (98.24%), recall
(81.91%), and ROC score (90.10%).
Statistical Analysis for Credit Card Fraud Detection
To assess the quality of our fraud detection model, we use precision, recall, and F1-
score, which directly impact fraud detection accuracy. A high precision ensures minimal
false positives, while high recall minimizes missed fraudulent transactions.
T-test (Comparing Two Models)
A T-test was performed to compare the precision of Extra Trees and Random Forest
models. The result showed that Extra Trees had a significantly higher precision,
suggesting it is better at detecting fraud cases correctly.
import numpy as np
from scipy.stats import ttest_ind

# Precision values for models

precision_rf = np.array([0.2170, 0.2180, 0.2165, 0.2175, 0.2182]) #
Random Forest
precision_et = np.array([0.1578, 0.1585, 0.1572, 0.1580, 0.1583]) #
Extra Trees

# Recall values for models

recall_rf = np.array([0.8461, 0.8455, 0.8468, 0.8459, 0.8463]) #
Random Forest
recall_et = np.array([0.8191, 0.8185, 0.8197, 0.8190, 0.8192]) # Extra
Trees

# Perform independent T-tests

t_stat_prec, p_value_prec = ttest_ind(precision_rf, precision_et)
t_stat_recall, p_value_recall = ttest_ind(recall_rf, recall_et)

# Print results for Precision

print("T-test for Precision:")
print("T-statistic:", t_stat_prec)
print("P-value:", p_value_prec)
if p_value_prec < 0.05:
print("Significant difference found between Random Forest and Extra
Trees for Precision.")
else:
print("No significant difference found between Random Forest and
Extra Trees for Precision.")

# Print results for Recall

print("\nT-test for Recall:")
print("T-statistic:", t_stat_recall)
print("P-value:", p_value_recall)
if p_value_recall < 0.05:
print("Significant difference found between Random Forest and Extra
Trees for Recall.")
else:
print("No significant difference found between Random Forest and
Extra Trees for Recall.")

Results:

ANOVA (Comparing Multiple Models)

An ANOVA test was applied to analyze differences in precision among multiple
models. The results indicated that the Extra Trees model performed significantly better
than others in fraud detection.

Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
20 pages
Midlands State University Msu Faculty of
No ratings yet
Midlands State University Msu Faculty of
14 pages
Steel Penstock Coating and Lining Rehabilitation: A Hydropower Technology Roundup Report
100% (2)
Steel Penstock Coating and Lining Rehabilitation: A Hydropower Technology Roundup Report
106 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
project report
No ratings yet
project report
34 pages
CASE STUDY STOCK MARKET PREDICITON
No ratings yet
CASE STUDY STOCK MARKET PREDICITON
10 pages
MACHINE LEARNING (1)
No ratings yet
MACHINE LEARNING (1)
12 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
IDS U-5 answers
No ratings yet
IDS U-5 answers
16 pages
AttiqAhmadAfsar_lab_13
No ratings yet
AttiqAhmadAfsar_lab_13
5 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Implementation of Credit Card Fraud Detection Using Random Forest Algorithm
100% (1)
Implementation of Credit Card Fraud Detection Using Random Forest Algorithm
10 pages
Python Essential Methods In Machine Learning
No ratings yet
Python Essential Methods In Machine Learning
6 pages
frmCourseSyllabusIPDownload (2)
No ratings yet
frmCourseSyllabusIPDownload (2)
3 pages
Introduction of Phase 4
No ratings yet
Introduction of Phase 4
14 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
A10-Model-Performance-v2-2up
No ratings yet
A10-Model-Performance-v2-2up
11 pages
SML
No ratings yet
SML
8 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
FULLTEXT02
No ratings yet
FULLTEXT02
72 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
Random Forest
No ratings yet
Random Forest
11 pages
Modelling-project notes-2
No ratings yet
Modelling-project notes-2
26 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Review Paper[2]
No ratings yet
Review Paper[2]
3 pages
Presentation Credit Card
No ratings yet
Presentation Credit Card
25 pages
FINAL
No ratings yet
FINAL
20 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
14 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
UNIT 2-Part2
No ratings yet
UNIT 2-Part2
9 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
ML Ex 5
No ratings yet
ML Ex 5
6 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
Loan
No ratings yet
Loan
3 pages
Urtc45901.2018.9244782
No ratings yet
Urtc45901.2018.9244782
4 pages
final-way
No ratings yet
final-way
15 pages
Import As From Import From Import From Import
No ratings yet
Import As From Import From Import From Import
9 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
ML Assignment-8
No ratings yet
ML Assignment-8
3 pages
Credit_Card_Approval_Prediction_Report-Final
No ratings yet
Credit_Card_Approval_Prediction_Report-Final
27 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
Master Thesis TU Delft Dinesh Bisesser 2020
No ratings yet
Master Thesis TU Delft Dinesh Bisesser 2020
104 pages
Data Collection
No ratings yet
Data Collection
8 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Mini Project
No ratings yet
Mini Project
9 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Assignment (4)
No ratings yet
Assignment (4)
5 pages
Classification Metrics For Generalized Results
No ratings yet
Classification Metrics For Generalized Results
70 pages
Data Science
No ratings yet
Data Science
8 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
BRM Research Paper
33% (3)
BRM Research Paper
12 pages
Corporate Social Responsibility, Innovation Capability and Firm Performance: Evidence From SME
No ratings yet
Corporate Social Responsibility, Innovation Capability and Firm Performance: Evidence From SME
21 pages
Foreign Language Annals - 2010 - Allen - Language Learning Motivation During Short Term Study Abroad An Activity Theory
No ratings yet
Foreign Language Annals - 2010 - Allen - Language Learning Motivation During Short Term Study Abroad An Activity Theory
23 pages
Selection Criteria For Providers of Third-Party Logistics Services: An Exploratory Study
No ratings yet
Selection Criteria For Providers of Third-Party Logistics Services: An Exploratory Study
10 pages
"Comparative Study of Physical Filing and E-Filing of Returns Through Spa Capital Services Ltd. A Synopsis
No ratings yet
"Comparative Study of Physical Filing and E-Filing of Returns Through Spa Capital Services Ltd. A Synopsis
6 pages
[5]
No ratings yet
[5]
36 pages
Industrial Tour Report On (Four H Apparels LTD) : University of Chittagong
No ratings yet
Industrial Tour Report On (Four H Apparels LTD) : University of Chittagong
34 pages
College of Education: Module 2: Important Concepts in Assessment I. Objectives
No ratings yet
College of Education: Module 2: Important Concepts in Assessment I. Objectives
5 pages
Clinical Intern Evaluation Rubric
No ratings yet
Clinical Intern Evaluation Rubric
6 pages
Report of Review (Assignment)
No ratings yet
Report of Review (Assignment)
4 pages
Pengaruh Motivasi Dan Disiplin Kerja Terhadap Kinerja Karyawan PDAM Kota Tomohon
No ratings yet
Pengaruh Motivasi Dan Disiplin Kerja Terhadap Kinerja Karyawan PDAM Kota Tomohon
7 pages
INST252 Sec2
No ratings yet
INST252 Sec2
128 pages
Handout_6_HR Demand.docx
No ratings yet
Handout_6_HR Demand.docx
5 pages
Emergency Food Security Assessment: Handbook
No ratings yet
Emergency Food Security Assessment: Handbook
35 pages
Characteristics Roles and Challenges of Traffic Pe
No ratings yet
Characteristics Roles and Challenges of Traffic Pe
19 pages
Unit 1: Introduction To Business Economics
No ratings yet
Unit 1: Introduction To Business Economics
60 pages
Fin Irjmets1668589338
No ratings yet
Fin Irjmets1668589338
6 pages
The Role of Urban Green Spaces in Enhancing Environmental Sustainability and Human Well-Being
No ratings yet
The Role of Urban Green Spaces in Enhancing Environmental Sustainability and Human Well-Being
2 pages
Types of Academic Research
100% (1)
Types of Academic Research
25 pages
Tiruye Group 2
No ratings yet
Tiruye Group 2
83 pages
Weird Dissertation Titles
100% (2)
Weird Dissertation Titles
4 pages
IJRAS 782-01 Final PDF
No ratings yet
IJRAS 782-01 Final PDF
4 pages
Employer Branding: Employer Attractiveness and The Use of Social Media
No ratings yet
Employer Branding: Employer Attractiveness and The Use of Social Media
4 pages
Report Birla WCP
No ratings yet
Report Birla WCP
31 pages
ECN302E ProblemSet08 IntroductionToTSRAndForecastingPart2 Solutions
No ratings yet
ECN302E ProblemSet08 IntroductionToTSRAndForecastingPart2 Solutions
7 pages
Grade 10
No ratings yet
Grade 10
4 pages
Epilepsy Management
No ratings yet
Epilepsy Management
7 pages