0% found this document useful (0 votes)

36 views

Breast Cancer Detection

Uploaded by

SADIA JANNAT 201-15-3136

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Breast Cancer Detection

Uploaded by

SADIA JANNAT 201-15-3136

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

“Evaluating Machine Learning Algorithms for Breast

Cancer Detection in Developing Countries”

1|Page
Table of Contents

1.Introduction…………………………………………………………………………..2-3
2.Proposed Methodology………………………………………………………….3-4
3.Dataset Description………………………………………………………………..4-7
3.1 Sample of Dataset…………………………………………………………………………5-6

3.2 Description Table………………………………………………………………………….6-7

3.3 Data Table ……………………………………………………………………………………….7

4.Preprocessing………………………………………………………………………..7-8
4.1 Label Encoding Using Diagnosis Column……………………………………………8

4.2 Converting int to float of diagnosis column……………………………………….8

4.3 Converting int to float of ID column………………………………………………….8

4.4 Missing Value Checking…………………………………………………………………….8

5.Implementation……………………………………………………………………..9-12
5.1 Data Visualization………………………………..…………………………………………9-10

5.2 Machine Learning Algorithm…………………………………………………………11-12

6.Result……………………………………………………………………………………13-14

7.Conclusion…………………………………………………………………………………15

2|Page
Title: “Evaluating Machine Learning Algorithms for
Breast Cancer Detection in Developing Countries”
1.Introduction
Every day, cancer impacts people all over the world in a variety of ways. Breast
cancer is the most irritating type of cancer after skin cancer. Human embryonic
tissues are made up of tiny cells. Uncontrolled cell proliferation in the breast can
occasionally result in lumps known as tumors. During breast cancer, these tumor
cells create lumps, which are referred to as tumors. During breast cancer, these
tumor cells begin to proliferate abnormally and develop into cancer. Both men
and women can be treated for breast cancer. Women, on the other hand, are
more likely to get this sickness. Breast cancer is becoming more common by the
day. Breast cancer rates often increase with age, shorter periods, delayed first
childbirth, shorter nursing duration, family history, prior breast cancer or tumor,
abnormally big breasts, hormone treatment, prior breast radiation, obesity, and
high alcohol intake. We can reduce the number of breast cancer deaths by
applying early detection. We can make predictions based on specific signs and
behaviors. Here are a few examples of symptoms:
It feels different because of the breast lump or thickening tissues. The size,
shape, and look of the breasts have changed noticeably. On the breast skin,
noticeable changes such as lumpiness may be detected. The epidermis of the
breast shows signs of redness or pitting. If the patient shows any of these
symptoms or clues, they should see a doctor straight away. Statistics reveal that
women are diagnosed with breast cancer 110 times out of every 100, even if
they have no symptoms. As a result, cancer spreads, increasing the likelihood of
death. This requires regular breast cancer screening.
According to recent research, the survival percentage for women with breast
cancer is 91% five years following diagnosis.
After ten years, the rate is 86%.
After 15 years, the rate is 80%.

3|Page
Breast cancer is classified into stages and grades. The stages of breast cancer
define how far the cancer has gone and how quickly it has developed in the
human body. If cancer is found at an early stage, it is easily treatable. However,
when cancer spreads, the danger of death skyrockets. We can identify cancer
more accurately using machine learning and its algorithm. In this research, we
employed certain detection techniques. Support Vector Machine (SVM),
Decision Tree Algorithm (DT), Random Forest Algorithm (RF), and K Nearest
Neighbors Algorithm (KNN) are a few examples. The SVM method outperforms
the other seven algorithms in terms of accuracy.

2.Proposed Methodology
We got data from online (kaggle.com) for this paper and discovered that 357 of
the 570 patients are benign and 212 are cancer. Various influencing factors and
features are discovered after collecting data for input variables. The block
diagram of the proposed work is:

Figure: Block Diagram of the Proposed Methodology

To carry out the idea, a dataset is necessary. A total of 569 data points were
collected for pre-processing. Almost 32 columns have been added.
In this dataset, "Diagnosis" is the goal attribute.

4|Page
The required machine learning algorithm for Classification is shown below:
• RF
• DT
• SVM
• KNN

3.Dataset Description
Our dataset was acquired via the kaggle.com website. There are 32 columns and
569 rows in this data set. The diagnostic column is the goal property, and the 31
columns that follow are feature attributes. As we can see from data
visualization, the target class of the data set comprises two stages of breast
cancer: the first is benign, and the second is aggressive.
After gathering the dataset, each column is converted to a numeric format, and
the diagnosis is classified by target class. Following that, a final object is handled,
as well as the conversion of integer values to floats and any missing data.
Machine learning algorithms were employed in this area. Following
implementation, certain outcomes emerge.
Based on the structure of the data, it looks like a dataset related to breast cancer
diagnosis. The columns include various features such as radius, texture,
perimeter, area, smoothness, compactness, concavity, concave points,
symmetry, and fractal dimension at different moments (mean, standard error,
and worst).
The dataset seems to have an "id" column, a "diagnosis" column (with values
'M' for malignant and 'B' for benign), and several numerical columns
representing different features extracted from breast cancer biopsies.

5|Page
3.1 Sample of Dataset
Here is the sample of our breast cancer detection csv dataset:

6|Page
3.2 Description Table
In the implementation phase dia, rm, tm, pm, am, sm, cm, cnm, cpm, sym, fdm,
rs, ts, ps, as, ss, cms, cs, cps, sys, fds, rw, tw, p_w, aw, sw, cmw, cnm, cpw, use
dataset as a Feature Attribute. One column (diagnosis) is the target attribute.
Train data accounts for 70% of the total, whereas test data accounts for 30%,
and train values (x_train, y_train) are input. Enter the train value as (xtrain1,
ytrain1) when employing a method that requires feature scaling, and the device
will deliver the projected output.

Figure : Total Output of Train Test sample.

3.3 Data Table

In this dataset, "Diagnosis" is the goal attribute.

7|Page
4.Preprocessing
Preprocessing in the context of datasets refers to the tasks and techniques used
to clean, transform, and prepare the data before it is used for analysis or
machine learning.
The methods selected for preprocessing are-

• Label encoding of diagnosis column

• Converting int to float of diagnosis column
• Converting int to float of ID column
• Missing value checking
• Scaling(Z-score normalization)

Label encoding of diagnosis column

Label encoding is performed to convert categorical labels or text data into

numerical representations. the label encoding transformation to the 'diagnosis'
column in the DataFrame 'df'. The fit_transform of the LabelEncoder is used,
which both fits the encoder to the unique values in the 'diagnosis' column and
transforms the labels into numerical values. The encoded values are then
assigned back to the 'diagnosis' column in the DataFrame.

8|Page
Converting int to float of diagnosis column

We use this method to change the data type of the values in the selected
column. In this case, it is specifying that the values in the 'diagnosis' column
should be converted to the float data type. The purpose of this conversion might
be to ensure that the 'diagnosis' column, which likely contains categorical values
(e.g., 'M' for malignant and 'B' for benign in the context of breast cancer
diagnosis), is represented as numerical values in the form of floating-point
numbers.

Converting int to float of ID column

This is a method used to change the data type of the values in the selected column.
In this case, it is specifying that the values in the 'id' column should be converted
to the float data type. The purpose of this conversion might be to ensure that the
'id' column, which likely contains numerical identifiers, is represented as floating-
point numbers.

9|Page
Missing value checking

Check for missing values and we don’t find any missing values in our datasets.

Scaling

Scaling is necessary in machine learning to ensure that all features contribute

equally to the model training process. Feature scaling is a method used to
normalize the range of independent variables or features of data. Scaling is what
our algorithm does to keep the variables in balance. We determine the algorithm's
accuracy both with and without scaling. It depends on how the method is used.
For instance, utilize scaling in SMV to obtain the highest accuracy for it. Without
scaling, the accuracy is not as outstanding but the results of DT and random forest
are good.

10 | P a g e
5.Implementation
We obtained our data from the kaggle.com platform. This data collection
contains 32 columns and 569 total data. The diagnostic attribute is the target
attribute in these 32 columns, whereas the feature attribute is present in the
remaining 31 columns. Based on data visualization, we know that the data set's
target class comprises primarily of two stages of breast cancer: benign and
malignant.

After gathering the dataset, each column is converted to a numeric format, and
the diagnosis is classified by target class. Following that, a final object is handled,
as well as the conversion of integer values to floats and any missing data.
Machine learning algorithms were employed in this area. Following
implementation, certain outcomes emerge.

5.1 Data Visualization

We also use some of data visualization method. Some of the visualization

method that we use are-

Pair plot- We use pairplot visualization method to explore relationships

between multiple variables in a dataset.We use Seaborn library to visualize
pairwise relationships between variables in a DataFrame. We specifically
targets the DataFrame df, selecting columns from index 1 to 6 (inclusive) and
using the 'diagnosis' column as the hue variable.

11 | P a g e
Figure : Data Visualization using pairplot

Correlation-Here we use Correlation statistical measure to quantifies the

strength and direction of the relationship between two variables. The resulting
correlation matrix will be a square matrix with dimensions (12 x 12), where each
element represents the correlation coefficient between the corresponding pair
of columns. This matrix provides valuable insights into the relationships
between the selected features in the data frame.

Figure : Correlation

Heatmap- For our project we used Heatmaps effectively visualize the

distribution and intensity of data values, making it easier to spot patterns and
trends that might not be apparent from other visualizations. The color-coding
scheme helps identify areas of high or low concentrations, allowing for quick
visual analysis and interpretation.

12 | P a g e
Figure : Heatmap

5.2 Machine learning algorithms

Our goal for these experiments is to find out the best models for breast cancer
detection. Models will be created using the following nine ML algorithms.
Among them some best results come out after implement.

Figure: Visualization of SVM algorithm

13 | P a g e
Figure: Visualization of KNN algorithm

Figure: Visualization of RandomForest algorithm

Figure: Visualization of Decision Tree algorithm

14 | P a g e
6.Result

This research aims to perform a comparison among ML methods for breast

cancer detection and diagnosis. The five most popular supervised ML techniques
named support vector machine (SVM), decision tree (DT), logistic regression
(LR), random forest (RF), K-nearest neighbor (KNN) technique were used for
classification. SVM provides 97% accuracy, Random Forest provides 96%
accuracy, Decision Tree provides 92% accuracy, and KNN provides 94% accuracy
gradually. Based on our findings, SVM has the highest accuracy of 95%. It was
discovered in this research that it provides 95% accuracy (best accuracy) in
breast cancer patients.
Algorithms Accuracy with train data Accuracy with test data
SVM 98% 97%
Random Forest 100% 96%
KNN 95% 94%
DT 100% 92%

7.Conclusion:

The study's findings suggest the use of a few machine learning algorithms to
gauge and predict the early identification of breast cancer. An online dataset
called breast cancer detection was gathered and utilized in this system's
organization and execution, sourced from kaggle.com. On the basis of this
dataset, a suggested model was created, refined, and put into use. Included is
the fact that, when it comes to test data, SVM performs the best among all
machine learning algorithms, with 97% accuracy and train data accuracy of 98%.
However, Random Forest, KNN, and DT also offer higher accuracy: 96%, 94%,
and 92% for test data, and 100%, 95%, and 100% for train data. Although there
are just 569 data in our dataset, the model can achieve significantly higher
accuracy with many more.

15 | P a g e

Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
8 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
Final Big Data
No ratings yet
Final Big Data
23 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Cancer Detection Using Data Mining
No ratings yet
Cancer Detection Using Data Mining
13 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Facial Emotion Detection Presentation
No ratings yet
Facial Emotion Detection Presentation
14 pages
Application of Big Mining On Health Care Industry
No ratings yet
Application of Big Mining On Health Care Industry
6 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
Project Final
No ratings yet
Project Final
15 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
HW Wincon
No ratings yet
HW Wincon
3 pages
Support Vector Machine (SVM) - Bioinformatics
No ratings yet
Support Vector Machine (SVM) - Bioinformatics
10 pages
Ankita Patra
No ratings yet
Ankita Patra
17 pages
Breast Cancer Prediction
No ratings yet
Breast Cancer Prediction
5 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
s41598-022-26378-6_250206_030727
No ratings yet
s41598-022-26378-6_250206_030727
11 pages
Cancer Detection
No ratings yet
Cancer Detection
12 pages
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
No ratings yet
Machine Learning For Breast Cancer Diagnosis A Proof of Concept
27 pages
BREAST CANCER VIJAY & ARAVIND PROJECT 2024-06-28 RECREATE
No ratings yet
BREAST CANCER VIJAY & ARAVIND PROJECT 2024-06-28 RECREATE
14 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
Sahana S_1BI22MC086
No ratings yet
Sahana S_1BI22MC086
47 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
Machine_Learning_data_analysis (1)
No ratings yet
Machine_Learning_data_analysis (1)
21 pages
DSML PROJECT REPORt Harshit
No ratings yet
DSML PROJECT REPORt Harshit
6 pages
Breast Cancer Detection With Machine Learning
No ratings yet
Breast Cancer Detection With Machine Learning
7 pages
Breast Cancer Diagnosis
No ratings yet
Breast Cancer Diagnosis
31 pages
3
No ratings yet
3
5 pages
Project Synopsis On Breast Cancer Detection Using Data Mining
No ratings yet
Project Synopsis On Breast Cancer Detection Using Data Mining
3 pages
Breast Cancer Project Analysis Report
No ratings yet
Breast Cancer Project Analysis Report
4 pages
Breast Cancer Diagnosis Using Machine Learning Alg
No ratings yet
Breast Cancer Diagnosis Using Machine Learning Alg
13 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Classification of Breast Cancer Detection by Using Machine Learning Technique
No ratings yet
Classification of Breast Cancer Detection by Using Machine Learning Technique
5 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
No ratings yet
Breast Cancer Diagnosis Using Deep Learning Algorithm: Naresh Khuriwal DR Nidhi Mishra
6 pages
Machine Learning Evaluation Metrics Lecturer
No ratings yet
Machine Learning Evaluation Metrics Lecturer
30 pages
Foml Project Report
No ratings yet
Foml Project Report
8 pages
Breast Cacner Detection
No ratings yet
Breast Cacner Detection
6 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Analysis of Impact of Principal Component Analysis and Feature Selection For Detection of Breast Cancer Using Machine Learning Algorithms
No ratings yet
Analysis of Impact of Principal Component Analysis and Feature Selection For Detection of Breast Cancer Using Machine Learning Algorithms
26 pages
Untitled PDF
No ratings yet
Untitled PDF
6 pages
Breastcancer Research
No ratings yet
Breastcancer Research
9 pages
machine learning
No ratings yet
machine learning
39 pages
Breast Cancer Classification Using Python
No ratings yet
Breast Cancer Classification Using Python
26 pages
Machine Learning Models For Breast Cancer Classifi
No ratings yet
Machine Learning Models For Breast Cancer Classifi
13 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
A-14 Mini Project Abstract
No ratings yet
A-14 Mini Project Abstract
15 pages
Mental Illness Prediction Using Deep Learning
No ratings yet
Mental Illness Prediction Using Deep Learning
58 pages
Efficient Breast Cancer Prediction Using Ensemble Machine Learning Models
No ratings yet
Efficient Breast Cancer Prediction Using Ensemble Machine Learning Models
5 pages
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
No ratings yet
A Homogeneous Ensemble Classifier For Breast Cancer Detection Using Parameters Tuning of MLP Neural
22 pages
br inel
No ratings yet
br inel
11 pages
Disease Presiction
No ratings yet
Disease Presiction
32 pages
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
No ratings yet
Utilizing Cutting-Edge Machine Learning Methods fo_241221_101813 paper
7 pages
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
No ratings yet
On Breast Cancer Detection: An Application of Machine Learning Algorithms On The Wisconsin Diagnostic Dataset
5 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Goal Based AI-Agents Mid Term
No ratings yet
Goal Based AI-Agents Mid Term
11 pages
Network Design Assignment
50% (2)
Network Design Assignment
15 pages
Lesson in Tle
100% (1)
Lesson in Tle
5 pages
Kaseya Monitor Sets
No ratings yet
Kaseya Monitor Sets
10 pages
Sales Lead Follow Up Planner
No ratings yet
Sales Lead Follow Up Planner
19 pages
BSNL Telecom Career Prospectus
No ratings yet
BSNL Telecom Career Prospectus
12 pages
Lecture 3 - Database Schema (ANSI SPARC)
No ratings yet
Lecture 3 - Database Schema (ANSI SPARC)
14 pages
Bca Syllabus
No ratings yet
Bca Syllabus
45 pages
Button Operated Gear Shifting Mechanism For Two Wheeler
75% (8)
Button Operated Gear Shifting Mechanism For Two Wheeler
23 pages
Page 10th Computer Ch-2 Exercise Half
No ratings yet
Page 10th Computer Ch-2 Exercise Half
10 pages
Sample Project Estimate Template
No ratings yet
Sample Project Estimate Template
15 pages
Simple DIY Induction Heater - RMCybernetics
No ratings yet
Simple DIY Induction Heater - RMCybernetics
44 pages
Why Are Information Technology Controls and Audit Important
No ratings yet
Why Are Information Technology Controls and Audit Important
5 pages
Notice - BPUT Tech Carnival 2024
No ratings yet
Notice - BPUT Tech Carnival 2024
27 pages
Free PDF Modular Arithmetic
0% (1)
Free PDF Modular Arithmetic
2 pages
Unit-3 Process Management
No ratings yet
Unit-3 Process Management
74 pages
Ricoh DX 3343
No ratings yet
Ricoh DX 3343
4 pages
Resume 2019
No ratings yet
Resume 2019
2 pages
3.3.2.8 Lab - Configuring Basic PPP With Authentication
0% (2)
3.3.2.8 Lab - Configuring Basic PPP With Authentication
17 pages
Data Dependency
No ratings yet
Data Dependency
5 pages
Bia Calculations
No ratings yet
Bia Calculations
15 pages
Light Burn Docs
No ratings yet
Light Burn Docs
225 pages
VECM
No ratings yet
VECM
56 pages
Software Quality Assurance Term Paper
100% (1)
Software Quality Assurance Term Paper
4 pages
Troubleshooting Kerberos Errors PDF
No ratings yet
Troubleshooting Kerberos Errors PDF
65 pages
Company Letter Mayor Limay
No ratings yet
Company Letter Mayor Limay
29 pages
R12 Trading Community
No ratings yet
R12 Trading Community
6 pages
Nagarjuna Institute of Engineering & Technology, Nagpur SESSION 2021-22 Question Bank (MCQS)
No ratings yet
Nagarjuna Institute of Engineering & Technology, Nagpur SESSION 2021-22 Question Bank (MCQS)
16 pages
Creating A Mod For Titan Quest
No ratings yet
Creating A Mod For Titan Quest
23 pages
Jeep 112
No ratings yet
Jeep 112
7 pages

Breast Cancer Detection

Uploaded by

Breast Cancer Detection

Uploaded by

“Evaluating Machine Learning Algorithms for Breast

Cancer Detection in Developing Countries”

3.2 Description Table………………………………………………………………………….6-7

3.3 Data Table ……………………………………………………………………………………….7

4.2 Converting int to float of diagnosis column……………………………………….8

4.3 Converting int to float of ID column………………………………………………….8

4.4 Missing Value Checking…………………………………………………………………….8

5.2 Machine Learning Algorithm…………………………………………………………11-12

Figure: Block Diagram of the Proposed Methodology

Figure : Total Output of Train Test sample.

3.3 Data Table

• Label encoding of diagnosis column

Label encoding of diagnosis column

Label encoding is performed to convert categorical labels or text data into

Converting int to float of ID column

Scaling is necessary in machine learning to ensure that all features contribute

5.1 Data Visualization

We also use some of data visualization method. Some of the visualization

Pair plot- We use pairplot visualization method to explore relationships

Correlation-Here we use Correlation statistical measure to quantifies the

Heatmap- For our project we used Heatmaps effectively visualize the

5.2 Machine learning algorithms

Figure: Visualization of SVM algorithm

Figure: Visualization of RandomForest algorithm

Figure: Visualization of Decision Tree algorithm

This research aims to perform a comparison among ML methods for breast

You might also like