DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
EAST WEST INSTITUTE OF POLYTECHNIC [597]
YELAHANKA NEW TOWN, BENGALURU – 560064
Predicting Breast Cancer Using Machine
Learning.
Under the guidance of
BY
RAMESH B.N
VIJAY KUMAR M [597CS20045]
HEAD OF THE
DEPARTMENT(CSE)
INTRODUCTION
What is breast cancer
Breast cancer is the disease in which cells in the breast grow out of control .there are different kind of breast cancer.
The kind of breast cancer depends on which cells in the breast turns into Cancer. Breast cancer can begin with
different parts of the breast.
Breast cancer can spread outside the breast through blood vessels and lymph vessels
2
OBJCETIVE
The proposed machine-learning approaches could predict breast cancer as the early detection of this
disease could help slow down the progress of the disease and reduce the mortality rate through
appropriate therapeutic interventions at the right time. Applying different machine learning approaches,
accessibility to bigger datasets from different institutions (multi-centre study), and considering key
features from a variety of relevant data sources could improve the performance of modelling.
3
LITERATURE SURVEY
• Cancer is the second death-causing disease that affects worldwide women.
• Cancer is a disorder range of the lethal cell if left untreated leads to indolent lesions and mortality.
• Abnormal cells are created as a result of a genetic mutation that grows out of control and becomes cancerous due to
the changes in its deoxyribonucleic acid.
• Early identification of breast cancer can assist in the prognosis process which can successfully mitigate serious
complications of the disease with higher recovery.
- Jaffar et a and Khan
4
SYSTEM REQUIREMENT SPECIFICATION
Hardware Requirements
System Processor : Core i3 / i5
Hard Disk : 500 GB.
Ram : 4 GB.
Software Requirements
Operating system : Windows XP / 7
Coding Language : Python
Software : Anaconda
IDE : Jupyter Notebook
5
SYSTEM REQUIREMENT SPECIFICATION
Non-functional requirements
• Reliability
• Performance
• Portability
• Scalability
• Flexibility
• Security
6
MODEL DESIGN
7
METHODOLOGY
This section describes about three algorithms used in this system namely
• Decision tree Algorithm
• K-Nearest Neighbour (KNN) Algorithm
• Logistic Regression algorithm
8
IMPLEMENTATION
• Decision tree Algorithm
Decision Tree algorithm belongs to the supervised learning algorithms. decision tree algorithm can be
use regression and classification problems. The general motive of using Decision Tree is to create a
training model which can use to predict class or value of target variables by learning decision rules
inferred from prior data(training data).
9
IMPLEMENTATION
K-Nearest Neighbour (KNN)
Algorithm
• The k-nearest neighbors' algorithm, also known as KNN or k-NN, is a non-parametric, supervised
learning classifier, which uses proximity to make classifications or predictions about the grouping of an
individual data point.
10
Implementation
• Logistic Regression algorithm
Logistic regression is an example of supervised learning. It is used to calculate or predict the
probability of a binary (yes/no) event occurring. An example of logistic regression could be
applying machine learning to determine if a person is likely to be infected with COVID-19 or not.
Since we have two possible outcomes to this question - yes, they are infected, or no they are not
infected - this is called binary classification.
11
SNAPSHOTS
Decision tree Algorithm Snap 1
12
SNAPSHOTS
K-Nearest Neighbour (KNN) Algorithm Snap 2
13
SNAPSHOTS
Logistic Regression algorithm Snap 3
14
SNAPSHOTS
Logistic Regression algorithm Snap 3.1
15
SNAPSHOTS
Logistic Regression algorithm Snap 3.2 (Confusion Matrix)
16
SNAP SHOT
Classification Accuracy Comparison of Models snap 4
17
CONCLUSION AND FUTURE ENHANCEMENT
Conclusion
• Medical dataset can not only be classified with the previously mentioned algorithms from
machine learning, there are many algorithms and techniques which may perform better
than these.
• Logical Regression surpasses all the other algorithms with an accuracy of 88.5964
%.Thus I Conclude, this project by saying Logical Regression Classification algorithm is
best and better for handling medical data set.
Future enhancement
• In the future, the designed system with the used machine learning classification algorithm can be used
to predict or diagnose other diseases.
• The work can be extended or improved for the automation of Breast cancer analysis including some
other machine learning algorithms
18
REFERENCES
[1] Abdelghani,Bellaachia.,Erhan,Guven.2006. Predicting Breast Cancer Survivability Using Data Mining
Techniques . Scientific data mining workshop in conjuction with SIAM conference on Data Mining.
[2] Chen,M., Han,J., and Yu,P. 1997. IEEE Trans. Knowledge and Data Eng.8(866) .
[3] Diana, D. 2009. Prediction of recurrent events in breast cancer using the Naive Bayesian
Classification. Annals of University of Craiova, Math. Comp. Sci. 36(2):92-96 ISSN: 1223-6934.
[4] Harry, Z.,Shengli,S. 2004.Learning weighted Naive Bayes with accurate Ranking. 4th IEEE
International Conference on Data Mining.567-570,ISBN-0-7695- 2142-8.
[5] Item Intensities. Knowledge and Information Systems, 6(2):203–229.
[6] Kharya ,S.2012. Using data mining techniques for diagnosis and prognosis of cancer disease.
International Journal of Computer Science, Engineering and Information Technology 2(2):55-66.
[7] Kharya, S., Agrawal, S., and Soni,S.2014. Naive Bayes Classifiers: A Probabilistic Detection Model
for Breast Cancer. International Journal of Computer Applications (0975 – 8887) Volume 92 (10):26-31 .
19