Systematic Review
Systematic Review
https://doi.org/10.1007/s10916-020-01689-1
Received: 8 October 2020 / Accepted: 1 December 2020 / Published online: 4 January 2021
# The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021
Abstract
Breast cancer (BC) is the leading cause of death among women worldwide. It affects in general women older than 40 years old.
Medical images analysis is one of the most promising research areas since it provides facilities for diagnosis and decision-making
of several diseases such as BC. This paper conducts a Structured Literature Review (SLR) of the use of Machine Learning (ML)
and Image Processing (IP) techniques to deal with BC imaging. A set of 530 papers published between 2000 and August 2019
were selected and analyzed according to ten criteria: year and publication channel, empirical type, research type, medical task,
machine learning techniques, datasets used, validation methods, performance measures and image processing techniques which
include image pre-processing, segmentation, feature extraction and feature selection. Results showed that diagnosis was the most
used medical task and that Deep Learning techniques (DL) were largely used to perform classification. Furthermore, we found
out that classification was the most ML objective investigated followed by prediction and clustering. Most of the selected studies
used Mammograms as imaging modalities rather than Ultrasound or Magnetic Resonance Imaging with the use of public or
private datasets with MIAS as the most frequently investigated public dataset. As for image processing techniques, the majority
of the selected studies pre-process their input images by reducing the noise and normalizing the colors, and some of them use
segmentation to extract the region of interest with the thresholding method. For feature extraction, we note that researchers
extracted the relevant features using classical feature extraction techniques (e.g. Texture features, Shape features, etc.) or DL
techniques (e. g. VGG16, VGG19, ResNet, etc.), and finally few papers used feature selection techniques in particular the filter
methods.
Keywords Breast Cancer. Machine learning. Image processing. Structured literature review. Deep learning
Introduction diagnosis, which aims to discern the malignant and benign tu-
mors, as for prognosis helps to put a treatment plan. Up to this
One of the most common cancers for women in the world is point, there is no compelling method to prevent the occurrence of
Breast Cancer (BC). It happens when the cell tissue of the breast BC [2]. Therefore, breast tumor could effectively be treated in the
cells grows abnormally and start to divide rapidly [1]. The BC beginning periods of cancer. Thus, early detection and accurate
disease is distinguished by an overgrowth of a malignant tumor methods for screening the most punctual indications of BC are
in the breast [2]. The goal of BC screening is to achieve an early the initial steps to limit the risk of suffering [2]. The main treat-
ments for BC are surgery, radiotherapy, chemotherapy, hormone
This article is part of the Topical Collection on Image & Signal therapy, and biological therapy [3].
Processing Although the medical investigations in BC are encouraging,
the nonappearance of appropriate methods for early detection is
* Ali Idri still a challenge [4, 5]. The contribution of information technol-
[email protected]; [email protected] ogy has presented a new dimension referred to Medical Image
1
Processing (MIP). For instance, in the medical field, a vast ma-
Modeling, Simulation and Data Analysis, Mohammed VI
Polytechnic University, Benguerir, Morocco
jority of the analysis methods were developed using MIP [6, 7],
2
which can help to easily detect and recognize cancerous mass
Software Project Management Research Team, ENSIAS,
Mohammed V University in Rabat, Rabat, Morocco
from an infected breast. Due to the advancement of digital
8 imaging techniques, numerous computer vision, and channels
machine
and sources of the selected papers (RQ1), medical tasks (2021) 45: 8
learning techniques have been applied for analyzing and recog- (RQ2), type of contributions and empirical methods (RQ3), ma-
nizing the pathological images [8]. chine learning objective (RQ4), ML techniques (RQ5), datasets
One of the challenges facing the scientific community of and validation methods (RQ6), breast imaging techniques
computer vision is to provide software tools that can help the (RQ7), image pre-processing and segmentation techniques
physician extract relevant information to improve diagnosis (RQ8), feature extraction techniques and experimental design
[9]. By analyzing these images, radiologists use their expertise (RQ9), and feature selection techniques (RQ10).
to correlate the particular structures in the tissue with the char- The present paper is structured as follow: Section 2 de-
acteristics of the suspect regions on the medical images. This scribes the research methodology followed by this SLR.
process has many limitations: Section 3 reports the results of the ten RQs. Section 4 dis-
cusses the results obtained. Section 5 concludes this SLR.
– Experts must decide by analyzing images containing a
wide variety of information.
– lack of reproducibility of the estimate due to variations be- Research methodology
tween experts’ opinions and the subjectivity of the results.
The purpose of a structured literature review is to offer an over-
The evolution of computing power of computer tools has view of a research area by identifying the research type and
allowed the emergence of computer vision, which uses in quantity of a research field and to describe the methodologies
general artificial intelligence to deal with images. and results of primary studies [16]. The computer science field
To the extent of the authors’ knowledge, no SLR was carried mainly used unstructured overviews to summarize studies for the
out to summarize the findings of primary studies dealing with the literature review. Although according to Brereton et al. [14],
use of machine learning and image processing techniques for any every study must be systematically and rigorously reviewed in
breast cancer medical tasks such as diagnosis, prognosis and order to effectively summarize the studies and give relevant re-
treatment. However, the study [10] carried out a Systematic sults. Many disciplines such as: Medicine, Education, Social
Mapping Study (SMS) on the use of data mining techniques in Policy and Information Systems use a strict methodological
BC, the study [11] conducted a SMS on the use of ensemble framework with a predefine protocol. It is notable that the use
techniques in breast cancer, and [12] focused on the use of clas- of structured literature reviews was primarily done in Medicine
sification techniques to diagnosis breast cancer. In an earlier work [17], and started to be carried out in many computer science sub-
[13], we carried out a SMS to present an overview of the use of fields such as medical informatics [16], software engineering [14,
machine learning techniques and image processing in BC. The 15] and machine learning [13, 18, 19]. The SLR process involves
most important findings of the SMS showed that classification is three steps which are [20]: Planning, Conducting and Reporting
the most investigated objective of ML for BC, and most of the as shown in Fig. 1.
papers used private datasets. A systematic map study consists of
giving an overview of a large topic by reviewing primary studies, Research questions
following a predefined protocol and without going into details by
giving specific research questions; its main goal is to provide a The main goal of this paper is to provide an overview of the
classification scheme to identify the subtopics that needs a struc- primary studies published from 2000 to August 2019 in the
tured literature review (SLR) [14] or more research studies. field of machine learning and image processing techniques
Unlike the SMS, a SLR identifies the best practice, techniques, applied to Breast Cancer. Therefore, we identify ten research
tools or methods by reviewing papers of a specific topic, follow- questions with their motivations as shown in Table 1.
ing a predefined protocol and investigating detailed research
questions (RQs) [14, 15]. Search strategy
The present study carries out a SLR that extends our SMS
[13] by dealing with specific criteria related to the use of ML To formulate the search string, we used the principal key
and IP in BC such as: ML techniques used, BC image pre- words and their synonyms extracted from the research ques-
processing and segmentation techniques used, BC feature ex- tions. The Boolean AND was used to join the important parts
traction techniques and experimental design used, and BC and the Boolean OR was used to join alternative words. The
feature selection techniques used. finale search string was defined as followed:
It searches the primary studies published between 2000 to (Breast OR “Mammary gland”) AND (cancer* OR tumor
August 2019 in the six digital libraries: ScienceDirect, OR malignancy OR masses) AND (Prognosis OR Predict*
IEEEXPLORE, Pubmed, Springer, ACM and Google Scholar. OR Diagnosis OR Identification OR Analysis OR monitoring
It provides a synthesis and a summary of 530 selected papers OR treatment) AND (“data mining” OR intelligent OR
through ten research questions (RQs) on: year, publication classificat* OR cluster* OR associat* OR predict* OR
J Med Syst (2021) 45: 8 Page 3 of 20 8
“machine learning” OR “deep learning”) AND (model* OR & IC2: papers presenting an overview on the use of machine
algorithm* OR technique* OR rule* OR method* OR tool* learning and image processing techniques in breast cancer.
OR framework*)AND (mammogr* OR ultrasound OR & IC3: papers providing empirical/theoretical comparisons
thermogra* OR “magnetic resonance imaging” OR of machine learning and image processing techniques in
tomosynthesis OR tomography OR imag* OR “image pro- breast cancer
cessing” OR “medical images” OR “computer vision”). & IC4: papers published between 2000 and later
We search the relevant papers in six digital libraries: & Exclusion criteria
Science Direct, IEEEXPLORE, PubMed, ACM, Springer & EC1: Papers written in other languages than English.
and Google Scholar. These libraries offer a large number of & EC2: Papers dealing with others cancer types
candidate papers, furthermore they index several journals, & EC3: duplicated papers
conferences, and books addressing the topic of this study. & EC4: Short papers with only (2–3 pages)
& EC5: Presentations or posters.
RQ1 In which year, publication channels and sources Identify the publication trends, and the different publication channels
were the selected papers published? and sources of the selected papers.
RQ2 What are the medical tasks of breast cancer in Discover which discipline in breast cancer the machine
which machine learning and image processing learning has been applied.
techniques were addressed?
RQ3 What type of contributions and empirical Identify the different type of studies performed in machine
methods is being made to the area of machine learning and image processing applied to breast cancer.
learning and image processing in breast cancer?
RQ4 Which is the most investigated machine Discover the most investigated ML objective in BC literature.
learning objective?
RQ5 What are the most frequently used ML Discover the ML techniques most frequently applied to deal
techniques for image processing in breast cancer? with image processing in breast cancer.
RQ6 What are the datasets and validation methods Identify the most used datasets and most relevant performance
used to measure the performance of ML and IP in BC? measures to evaluate ML and IP in BC.
RQ7 What are the most used breast cancer Identify the most used Breast cancer imaging techniques
imaging techniques? (Mammography, ultrasound, MRI, histological images)
RQ8 What are the most used BC image pre-processing Identify the most used Image pre-processing techniques
and segmentation techniques? (noise reduction, contrast enhancement, image augmentation)
and segmentation techniques (Thresholding, Deep Learning
techniques, ROI manually extracted)
RQ9 What are the most used BC feature extraction Identify the feature extraction techniques (classical feature extraction
techniques and experimental design? techniques or deep learning techniques) and the experimental design
applied (Hybrid, end to end or classical)
RQ10 What are the most used BC feature selection techniques? Identify the feature selection techniques used (filter, wrapper or embedded).
8 Page 4 of 20 J Med Syst (2021) 45: 8
Quality assessment iterations to cover the maximum of primary studies from the
digital libraries used (Science Direct, IEEEXPLORE,
Quality assessment (QA) is applied to assess the quality of the PubMed, ACM, Springer and Google Scholar); note that
selected papers by rating them and selecting only the high- Google scholar covers a large number of libraries, so we de-
quality papers. The QA was performed for each paper accord- leted the duplicated papers in order to keep the original ones.
ing to the check list of Table 2. Similar check lists were used in In addition and in order to prevent excluding relevant papers:
[12, 21]. (1) the two authors rigorously applied the selection criteria on
Note that the answer of QA1 is based on the relevance of title, abstract and keywords to match the RQs; in case of a
the empirical results provided by the paper to response the doubt, the full article was analyzed and for any disagreement a
research questions of the paper. The answer of QA2 is based meeting took place to reach a consensus. And (2) to select
on the clarity of the empirical design and the techniques used papers with high quality, the two authors defined a minimum
in the experiment. QA3 scoring is based on the use of the level of quality assessment.
adequate performance measures to evaluate the results of the
study. As for QA4 refers to the ranking of the paper: for Data extraction bias This threat concerns the reliability of the
conferences, the ranking refers to Computing Research and extracted data from the selected studies in order to address the
Education Association of Australasia (CORE Conference RQs. Two researchers performed this task independently to
Ranking Exercise 2018), and for journals to the Journal minimize the threat of incorrect data extraction by completing
Citation Reports (JCR 2018). The QA is performed by the the form shown in Table 3. All disagreements between the
two authors independently and in case of a disagreement a authors were resolved by mean of discussions.
meeting took place to reach to a final decision.
After selecting the relevant papers (i.e. papers satisfying ICs/ This section presents an overview of the selection process
ECs and QA criteria), we followed the form of Table 3 to results. Thereafter, we present the results of each RQ.
extract the relevant data from the selected studies in order to
answer the ten RQs of Table 1. Similar form was used in [12]. Studies selection
Threats to validity As shown in the Fig. 2, 5817 candidate papers were retrieved
using the search string on the six digital libraries. When ap-
The main threats of validity for this study are presented below. plying the exclusion criteria on the titles, keywords and even-
tually the abstracts of the candidate papers, 5028 papers were
Study selection bias To select the relevant papers, we discarded. Then, we applied the inclusion/exclusion criteria
established a search string that contain the appropriate key- on the 789 remaining studies, and we retained 590 studies.
words; thereafter, the search was carried out in several As for the quality assessment, Table 2 summarize the checklist
QA1 Does the study give clear empirical results? Yes (+1) / No (0)
QA2 Does the study give a justified empirical design? Yes (+1) / No (0)
QA3 Does the study evaluate the performance of the Yes (+1) / No (0)
developed solution?
QA4 Is the study published in a recognized source? For conference:
(1.5) Rank CORE A or A*
(1) Rank Core B
(0.5) Rank Core C
(0) if not Core ranking
For journals:
(2) Rank JCR 2019 Q1
(1.5) Rank JCR 2019 Q2
(1) Rank JCR 2019 Q3 or Q4
(0) if note JCR 2019 ranking
J Med Syst (2021) 45: 8 Page 5 of 20 8
Study identifier
Publication year
Title
Source: Journal, Conference or books
RQ1: In which year, publication channels and sources were the selected papers related to machine learning and image processing in breast
cancer published?
Identification of the year of publication, the publication channel (Journal, conference, book section, report) and publication source.
RQ2: What are the medical tasks of breast cancer in which machine learning and image processing techniques are addressed?
Identification of the medical tasks of breast cancer that can be divided in six categories [10]: screening, diagnosis, treatments, prognosis, monitoring,
and management.
RQ3: What type of contributions and empirical methods is being made to the area of machine learning and image processing in breast cancer?
Identification of the research types [15]: Evaluation Research (ER), Solution Proposal (SP), Experience Papers, Review, Case study, Survey,
and Historical based evaluation.
RQ4: Which is the most investigated machine learning objective?
Identification of the machine learning objectives such as: classification, clustering, prediction and association.
RQ5: What are the most frequently used ML techniques for image processing in breast cancer?
Identification of the different machine learning techniques: artificial neural network, support vector machine, K-means, decision trees, etc.
RQ6: What are the datasets and validation methods used to measure the performance of ML and IP in BC?
Identification of the datasets, the performance measures used (accuracy, sensitivity, specificity and others), and the evaluation methods employed
(cross validation, jackknife, holdout, etc.) [11].
RQ7: What are the most used breast cancer imaging techniques?
Identification of the imaging techniques for the breast cancer (mammography, ultrasound, MRI and others)
RQ8: What are the most used BC image pre-processing and segmentation techniques?
Identification of the pre-processing techniques (CLAHE, intensity normalization, etc.).
Identification of the segmentation techniques (thresholding, deep learning techniques, etc.)
RQ9: What are the most used BC feature extraction techniques and experimental design?
Identification of the feature extraction techniques (classical feature extraction techniques or deep learning techniques)
Identification of the experiment design (Hybrid, End to end, or Classical).
RQ10: What are the most used BC feature selection techniques?
Identification of the feature selection techniques (filter, wrapper, embedded)
of QA scores. Since the perfect score is 5, we retained all the Symposium on Research in Applied Computation with 1.5%
papers that exceeded or equal to 3 (with 3 is the average (9), International Symposium on Biomedical Imaging (ISBI),
value). Table 4 shows a summary of the QA, and therefore IEEE International Conference on Big Data (Big Data), IEEE
530 articles were finally selected. The list of these 530 papers International Conference on Bioinformatics and Biomedicine
including all required information to answer the RQs of this (BIBM) with 0.4% (2) each one.
SLR is available upon request by email to the authors of this Fig. 3 shows the distribution over years of the 530 selected
study. papers. We observe that the number of papers published by
2015 was very low compared to the number of publications
since 2016. Indeed from 2000 to 2015, 21% (113) of the
RQ1: In which year, publication channels and sources were selected papers were published on the field of ML and IP
the selected papers published? applied to BC with 66 papers on journals and 47 on confer-
ences. During the period 2016–2019, the number of
Table 5 shows the distribution of the 530 selected papers
according to their publication channels and sources. 71%
(383) of the papers were published in journals, 27% (142) Table 4 Quality assessment score
were presented in conferences and only 1% (5) were presented
Quality level # of studies Percent (%)
as book chapters. The most frequent journals are Expert
Systems with Applications with 4% (21), Computers in Very low (0 ≤ score ≤ 1) 6 1%
Biology and Medicine with 2.6% (14), Computer Methods Low (2 ≤ score ≤ 2.5) 54 9%
and Programs in Biomedicine with 2.4% (12), IEEE Access Medium (3 ≤ score < 4) 147 25%
with 12% (10) and Scientific reports with 2% (9). The most High (4 ≤ score ≤ 5) 383 65%
recurrent conferences are RACS Proceedings of the ACM
8 Page 6 of 20 J Med Syst (2021) 45: 8
publications exponentially increases and reaches 78% (412) of investigated task was diagnosis with 73% (389 papers), followed
the selected papers with 313 papers in journals, 94 papers in by screening with 17% (88). treatment and prognosis were the
conferences and 5 of the papers as book chapters. least studied with 6% each one (26). For the management task,
there was only one selected paper [22], while the monitoring task
RQ2: What are the medical tasks of breast cancer in which was not investigated by any of the selected papers.
machine learning and image processing techniques are
addressed?
RQ3: What type of contributions and empirical methods are
Six medical tasks can be identified in BC [10]: being made to the area of machine learning and image
processing in breast cancer?
– Screening: identifying an unknown disease before the
appearance of the symptoms. We identify three main research types in the present SLR:
– Diagnosis: Identifying the nature of the disease. Evaluation research (ER), Solution Proposal (SP) and
– Treatment: Identifying of all the remedies of a disease Review [[12, 19]].
after it’s detection.
– Prognosis: Identifying the patient chance of remedy in & Empirical Evaluation: Studies evaluating existing or new
terms of mortality and morbidity. ML and IP techniques for BC.
– Monitoring: Identifying the different observations of the & Solution Proposal: Studies proposing new or an important
patient’s condition over time and the disease improvement of ML and/or IP techniques for BC (with or
– Management: Identifying the medical services. without an empirical evaluation).
& Review: Studies reviewing the papers dealing with ML
Figure 4 shows the distribution of the 530 selected papers and IP techniques for BC.
according to these six medical tasks. We observe that the most
Fig. 5 Shows the research types of the selected papers: 42% by using publicly available databases, followed by a case
(312) were SP and 66% among them (206) were empirically study based evaluation with 46.79% (146), and 0.29% (1)
evaluated; the remaining 34% (106) of SP papers did not for survey based evaluation. For ER evaluation research, most
investigated any empirical evaluation of their solution pro- of the papers used a case study based empirical evaluation
posals. In addition to the SP papers that were empirically with 52.67% (195), followed by historical based evaluation
evaluated (206), 165 papers were classified as ER since they with 47.43% (176), and 0.26% (1) of the remaining papers are
evaluated and/or compared existing ML/IP techniques for BC. survey-based evaluation. Review papers did not carry out any
We therefore have in total 51% (371) of the papers classified empirical evaluation method.
as ER. Review papers only represent 7% (53 papers)
Fig. 6. shows the evolution of the research type of the
selected papers over years; we observe that both SP and ER RQ4: Which is the most investigated machine learning
came into sight in 2000 and rise over the years. However, the objective?
number of papers published between 2000 and 2015 was very
low 21% (113) and began to increase in 2015. Moreover, very The RQ4 aims to identify the most investigated machine
few review papers were published in 2000 to 2016 but started learning objective in breast cancer. As shown in Fig. 8, the
to get more interested from 2017. most ML objective investigated was classification with 89%
ER/SP papers were empirically evaluated using three types (474) of the selected papers; BC classification consists of clas-
of empirical evaluation: case study, historical based evaluation sifying the tumor in malign or benign. The prediction objec-
and survey [23]. As shown in Fig. 7, most of the SP articles tive came second with 6% (32), followed by clustering and
with 52.88% (165) investigated a historical based evaluation association with 4% (19) and 1% (5) respectively.
Fig. 4 Distribution of selected papers per medical tasks Fig. 5 Distribution of the Research types
8 Page 8 of 20 J Med Syst (2021) 45: 8
RQ5: What are the most frequently used ML techniques per (19) and 40% (13) of the selected papers respectively. Few
objective for image processing in breast cancer? papers investigated ML and image processing for the associ-
ation objective: 5 articles used ARM. Note that many articles
Fig. 9 shows the distribution of ML techniques per objective. used ensemble techniques which combine different single ML
It can be noticed that the most used ML techniques in classi- techniques such as ANN, SVM and DT [24, 25].
fication were Artificial Neural networks (ANN) with 37% The use of ANNs for the different ML objectives in BC
(176) of the papers, followed by Support Vector Machine includes different types of ANNs such as:
(SVM) and Decision tree (DT) with 26% (125) and 19%
(91) respectively. Other classification techniques were also & Convolutional Neural Networks (CNN): is a class of DL
used such as: K-Nearest Neighbors (KNN) with 6% (28), techniques that contains an input, an output layer and mul-
Fuzzy C Means with 3% (14), Naive Bayes (NB) with 3% tiple hidden layers and use the convolution operation in at
(14), and Logistic Regression (LR) with 3%(14). Linear dis- least one of their layers [26].
criminant analysis (LA), Genetic algorithm (GA) and & Multi-layer Perceptron (MLP): is a type of ANNs that uses
Gaussian Mixture Modelling (GMM) were also investigated the Backpropagation supervised learning technique, and
with 1% (4) for each. For the clustering objective, ANN and contains three or more layers that are fully connected [27].
K-means techniques were used in 29% (5) of the papers, F- & Deep Neural Networks (DNN): is an ANN with multiple
Means in 24% (4), and LR in 18% (3) of the selected papers. hidden layers between the input and output layer, DNN
For the prediction objective, ANN and LR were used in 60% algorithms are widely used in IP to identify features, edges
and to classify the images [28].
Fig. 7 Distribution of research types and empirical types Fig. 8 Distribution of machine learning objectives
J Med Syst (2021) 45: 8 Page 9 of 20 8
Fig. 9 Distribution of ML
techniques over ML objectives of
selected studies
45% (81) of the papers used CNNs, followed by DNN with private while 53% (262) used public datasets. Note that some
35% (63) and 20% (36) of the papers used MLP. studies used multiple datasets to test their models [29–36]. For
public datasets, Mammographic Image Analysis Society
RQ6: What are the datasets and validation methods used (MIAS) was used by 28% (74) of selected studies, Digital
to measure the performance of ML and IP in BC? Database for Screening Mammography (DDSM) with 24%
(64), Breast Cancer Histopathological (BREAKHIS) with
The aim of RQ6 is to identify the different datasets, the vali- 9% (23), Breast Cancer Digital Repository (BCDR) with 6%
dation methods and the performance measures used to evalu- (16), WISCONSIN and INBREAST with 5% (13) each,
ate the use of machine learning and image processing in breast Mytos with 4% (11), and the Cancer Genome Atlas (TCGA)
cancer. Table 6 shows the most used datasets in the 530 se- with 2% (6). The remaining articles used other databases such
lected papers. It can be noticed that 47% (237) of datasets are as IMAGENET, ICIAR, Camelyon challenge, BUS, IRMA,
and AMIDA.
MIAS 74 Mammographic
DDSM 64 Mammographic
BREAKHIS 23 Histological images
BCDR 16 Mammographic
WISCONSIN 13 Mammographic
INBREAST 13 Mammographic
MYTOS 11 Histological images
TCGA 6 Magnetic Resonance
Imaging (MRI)
OTHER 42 –
Private 237 –
Fig. 10 Distribution of deep learning techniques in the selected studies
8 Page 10 of 20 J Med Syst (2021) 45: 8
For the experimental design, we identify three types of Figure 18 shows the distribution of the three architectures
design used in selected studies to extract the relevant features in the selected papers: 74% (290 papers) used a Classical
from the input images: architecture [37–40], 21% (82) used End to End architecture
[39, 40], and only 5% (19) used a Hybrid architecture [25, 30].
& Classical architecture: consists of the use of pre- For the Hybrid architecture, researchers used transfer learn-
processing and/or segmentation techniques for data ing to investigate DL techniques as feature extractors [41, 42],
cleaning and preparation, the use of classical feature ex- and for the End to End architectures, the researchers used the
traction techniques such as Texture features, Shape fea- transfer learning to finetune and selectively retrain some of the
tures or Statistical features, eventually the use of feature last layers of the DL technique to be used for feature extraction
selection techniques, and the use of classical ML tech- and classification as well [43, 46–49]. Note that Transfer
niques for classification such as KNN, SVM and DT. learning consists of: transferring the knowledge learned from
& Hybrid architecture: consists of the use of pre-processing a dataset of a domain to a new one in another field, speeding
and/or segmentation techniques for data cleaning ad prep- the convergence of the DL technique, and optimizing the per-
aration, the use of deep learning techniques for feature formance and significantly reduce computational time [34].
extraction such as VGG16, and VGG19, and the use of
classical ML techniques for classification such as KNN,
SVM and DT. RQ10: What are the most used feature selection techniques?
& End to End architecture: consists of the use of pre-
processing and/or segmentation techniques for data Feature selection consists of selecting subsets of the relevant
cleaning ad preparation, the use of deep learning tech- features of an input image and removing the less pertinent
niques for feature extraction and classification. ones. It can bring many advantages in terms of reducing the
complexity of the model, reducing overfitting, and improving
Accuracy. Feature selection techniques can be classified into
three categories: Filter, Wrapper, or Embedded methods [47].
Figure 19 shows that only 87 of the selected papers used
feature selection techniques: 47% (41) of the papers used
Fig. 17 Distribution of the feature extraction techniques Fig. 19 Distribution of the feature selection techniques
J Med Syst (2021) 45: 8 Page 13 of 20 8
Filter methods, 32% (28) used Wrappers, and 21% (18) used system to perform an effective diagnosis [50]. Note that the
Embedded methods. Note that two papers used both Filters treatment task was rarely studied in the selected papers al-
and Wappers methods [48, 49]. The most used Filter tech- though it is one of the most important medical disciplines in
nique was Correlation-based Feature Selector (CFS) with BC and should get more attention by researchers; for instance
53% (22), the most used Wrapper was Genetic Algorithms [51] addressed the importance of the integration of ML tech-
(GA) with 59% (16), and the most used Embedded techniques niques in the treatment task to provide a strong roadmap for
was Least Absolute Shrinkage and Selection Operator the cure of BC.
(LASSO) with 58% (10).
RQ3: What type of contributions and empirical
methods is being made to the area of machine
Discussion
learning and image processing in breast cancer?
This section discusses the results of the 10 research questions
The 530 selected papers can be classified in three types: eval-
of Table 1.
uation research, solution proposal and review. We notice that
most of the selected papers are ER followed by SP with 51%
RQ1: In which year, publication channels and sources
(371) and 42% (312) of the papers respectively. This can be
were the selected papers published?
due to the fact that the domain of ML and IP for BC still needs
new and more effective solutions to offer better results. Also,
From Fig. 3, it is noticeable that the number of publications
the fact that most of the SP papers (66%) empirically evaluat-
significantly increased in 2016, since ML and IP are becoming
ed their techniques proves a high scientific maturity level
an important issue and are increasingly used by researchers in
within the community. As for the papers presenting a review
the medical field, particularly in Breast Cancer [10, 12, 13].
gained more interested since 2017 due to the fact that the
This is due to the effectiveness of ML and IP techniques in
number of primary studies became important, and therefore
improving the performance of medical decisions. The selected
the need of synthesize and summarize their findings. For the
papers were published in different types of channels: journals,
empirical type, evaluation of solution proposal (SP) was in
conferences and book chapters. Furthermore, we notice that
general performed using historical data, since researchers
71% (383) of the papers were published in journals which
choose in general to firstly test their newly developed tech-
reflect the importance of the research and the good scientific
nique on publicly available databases; this comes down to the
level of maturity, since it is in general more difficult to publish
privacy of the data and the difficulty of collecting data from
in journals than in conferences and symposiums. For the
hospitals. As regards the evaluation research (ER), studies
sources of publication and as shown in Table 5, there is no
used in general case study as an empirical type to test the
specific publication source, but different ones were targeted
performance of existing techniques on new unseen cases; note
such as Medicine, computer science applied to Medicine,
that most of the ER studies were in collaboration with
computer science and artificial intelligence, and this is due
hospitals.
to the multidisciplinary of the field ML and IP applied to BC.
RQ2: What are the medical tasks of breast cancer in RQ4: Which is the most used machine learning
which machine learning and image processing objective?
techniques are addressed?
Fig. 8 shows that 89% (474) of selected papers investigated
Most of the selected papers addressed the diagnosis task with the classification objective, 6% (32) were for prediction, 4%
73% (389) [11] since it is important to detect breast cancer (19) for clustering, and the remaining 1% (5) articles investi-
earlier in order to improve the efficiency of the treatment. The gated the association objective [52–56]. The use of classifica-
majority of the research proposed new solution to design new tion techniques is explained by the fact that image processing
computer assisted diagnosis to help doctors to detect with steps include image preprocessing, segmentation, feature ex-
higher precision the type of the tumor. The remaining articles traction, feature selection and classification [7]. Therefore,
were interested in screening with 17% (88), treatment with 6% classification is an important step in IP which consists of clas-
(26) and prognosis with 6% (26) and only one article proposed sifying medical images to detect the type of the tumor.
a new solution for the management [22] task. The fact that Moreover, as discussed in RQ2, 73% (389) of the selected
most of the papers investigated the diagnosis task is due to its papers investigated the diagnosis task, and diagnosis is in
crucial role in: classifying patients that have or not a certain general reformulated as a classification problem rather than
disease such as BC, and constructing intelligent decision regression, clustering or association.
8 Page 14 of 20 J Med Syst (2021) 45: 8
RQ5: What are the most frequently used ML such as Apriori in five papers due to its effectiveness in find-
techniques for image processing in breast cancer? ing association rules, and being the most popular algorithm
used for solving association problems [52, 63, 64].
Figure 9 shows that the most used classification techniques
were artificial neural networks (ANN) with 37% (176) due to RQ6: What are the datasets and validation methods
their high reported performances in classification of breast used to measure the performance accuracy of ML and
cancer [15, 18–21]. The ANN techniques gained more interest IP in BC?
since 2015, in particular CNN which targeted the improve-
ment of the generalization performance of BC image classifi- Table 6 shows that 53% (262) of the selected papers used
cation. However, the performance of the CNN depends on public datasets, the most used one are MIAS with 28% (74)
number of hidden layers and neurons per layer. and DDSM with 24% (64) due to the fact Mammographic
Consequently, this type of ANN is difficult to optimize [57]. images are still the most used for BC diagnosis [51, 52, 63].
To overcome these drawbacks, many researchers used Deep We note that some studies used several datasets to compare
Neural Networks (DNN) to deal with high-dimensional data their results [10, 19, 26, 30, 33–51, 65–68]. For instance,
[58]. DNN algorithms often use a pre-trained model or trained Chougrad et al. [69] used 5 datasets (CBIS- DDSM, BCDR,
from scratch. However, most DNN are mainly based on CNN InBreast and MIAS) to assess the performance of a pre-trained
because they are built and optimized according to several pa- CNN model. Also in [70], Ghosh tested three classifiers
rameters such as the nature, size, and data type. Hence, a small (ANN, SVM and Bayesian Network) in two datasets Irvine
CNN model can provide better BC classification results if Machine Learning Repository and the Digital Database for
built and trained with efficient optimization. Some of the main screening Mammography repository. As for the private
benefits of using the DL and image processing are: reaching datasets. Furthermore, the use of ML and IP techniques for
expert level performance, showing promising improvements diagnosing BC is becoming a hot topic for physician and
in the detection and diagnosis of BC, and demonstrating su- doctors who started to implement ML/IP techniques in their
perior accuracy compared with previous feature extraction hospitals. Researchers and data scientist are then encouraged
algorithms [59, 60]. Although these benefits, we still observe to collaborate with clinics and medical centers to collect the
that DL techniques are not widely used compared with the required images to evaluate their BC solutions.
other ML techniques, and this is mainly due to three reasons: For the validation methods, we observe from Fig. 11 that
(1) DL techniques began attract more interest since 2015 and researchers preferred the use of K-fold as a validation tech-
they are newly developed and still in the state of the art, (2) DL nique since it is easy to understand and gives better perfor-
techniques in general require more efforts for the mance results compared to other evaluation techniques
hyperparameter tunning to avoid overfitting and are expensive [71–73]. For the performance measures, there was several
in terms of computation time compared to the classical ML metrics investigated such as Accuracy, Sensitivity,
techniques [61], (3) DL architectures are considered as black- Specificity and Receiver Operating Characteristic (ROC)
box models that are in general complex and not easy to inter- which are in general the most popular for evaluating classifi-
pret, as a result physicians do not trust them [62]. cation [53, 71, 72]. Note that Accuracy is the most used per-
The next widely used technique is SVM with 26% (125) formance measure with 74% (392 papers) since it is the sim-
since it is still a well-known and efficient classification tech- plest measure to evaluate a classifier and it represents the
nique [22–24]. Many papers combined the two techniques percentage of correct classification (true positive and true neg-
(ANN and SVM) by using ANN for feature extraction and ative). For instance, [74] carried out a comparative experiment
SVM for classification, which gave more accurate classifica- to assess different measures and found that Accuracy is highly
tion [12, 14–32]. 19% (91) of the selected papers used DTs for used by researchers compared to other metrics. We note that
classification, especially the Random Forest (RF) techniques Accuracy is a metric that can be used when manipulating
since RF used decision trees which are easy to understand and balanced datasets, and it was not recommended when dealing
interpret [39–41]. For clustering methods, the most used tech- with imbalanced data set as discussed in [75].
niques were K-means with 29% (5), followed by ANN, Fuzzy
c -Means and LR with 29% (5), 24% (4) and 18% (3) respec- RQ7: What are the most used breast cancer imaging
tively. K-means is the most popular technique for clustering techniques?
due to its simplicity and efficiency but also its capacity to be
applied on high size datasets instead of hierarchical clustering Fig. 13. shows that 42% of the relevant papers used
[42, 43, 46]. Papers investigating the prediction/regression Mammographic images as BC imaging techniques. This is
objective used LR technique with 60% (19) since LR offered due to the many advantages of using mammograms in BC:
a simple model which can be easily used and interpreted (1) it exposes the breast to much lower levels of X-rays for
[47–50]. The association objective used the ARM technique imaging compared with other devices [76], (2) the range of
J Med Syst (2021) 45: 8 Page 15 of 20 8
gray scale captured by the mammograms gives the ability to may be in various orientation which lead to developing a more
distinguish between small differences of shades [77], and (3) it precise ML model [34].
reduces mortality when an early diagnosis of the tumor is done In the other hand, segmentation techniques are less
[78]. Recently, Mammograms is one of the most suitable and used compared to the pre-processing ones. However, they
reliable tools for screening and a key technique for the early still are important to reduce computation time especially
detection of BC [60, 79]. Because of the number of available when using deep learning techniques [10, 34]. Fig. 16
mammograms public datasets, many studies used this imaging shows that 52% (276) of the selected papers used segmen-
modality rather than US or MRI for BC classification (normal, tation by means of the Thresholding method to extract
benign, or malignant) [80]. Moreover, breast US images are ROIs. The use of the thresholding technique is straight-
weak and poor in identifying small nodules and precise bor- forward and very practical: for instance [3, 88] proved the
ders of breast lesion [34]. In other hand, MRI technique has effectiveness of the thresholding technique to preprocess
been widely used in medical examinations, especially for can- the input images. It is noticeable that DL techniques for
cer investigation [59, 60, 79].The magnetic property of the segmentation are gaining more interest especially with the
hydrogen nucleus is used to retrieve very detailed images from use of YOLO algorithm [89] for mammographic images
any part of the body. Breast MRI is frequently used in the since YOLO gave promising results and more precise
management of BC, especially to determine the stage of the ROIs [35, 90, 91]; therefore, we encourage researchers
disease in the breast and to direct local therapy. Breast MRI to more investigate the Yolo technique in order to confirm
plays a role, perhaps complementary to mammography, in or refute the previous findings.
screening for high-risk patients [79]. The remaining imaging
techniques are rarely used by researchers since few public RQ9: What are the most used Image feature
datasets containing these imaging techniques are available extraction techniques and experimental architecture?
[34].
Feature extraction techniques were used to reduce computa-
RQ8: What are the most used Image pre-processing tion time and system complexity, in addition to improve the
techniques? performance of BC decision-making systems [36]. It is a pri-
mordial step to extract features when using medical images
Image pre-processing is a crucial step to prepare the data be- before the classification/regression tasks. As shown in Fig. 17,
fore starting the classification process. Researchers may com- several feature extraction techniques were used to extract the
bine different pre-processing techniques to prepare the input relevant features of the images by using texture features and
images in order to provide accurate classification results. As DL in 40% and 32% of selected papers respectively:
shown in Fig. 15, 36% (172) of the selected studies reduced
the noise in their input images by applying the mean and & Texture features were more investigated since the texture
median filters methods that are easily manipulated as of the images is a key element in the computer vision field
discussed in [12]; we also notice that most of the studies used to provide similarities and repetitive or semi repetitive
noise reduction since the noise is a daily challenge in the arrangements of an image [92]. Consequently, it is widely
clinical images as mentioned in [19]. 32% (153) of selected used in many sub-fields of medical such as BC [93], Lung
studies enhanced the color by using the CLAHE technique cancer [94], and Brain cancer [95]. GLCM is the most
that proves its effectiveness for contrast enhancement. used method to extract texture features considering the
Enhancing the contrast in BC images is very important since good results that it gave in many applications especially
it facilitates the detection of the tumor; for instance many in oncology imaging, notably when images have texture
studies improved the performance of their models by using features easily separable [92].
the CLAHE technique [81, 82]. In addition, 21% (100) of the & Deep learning techniques for feature extraction were used
selected papers used color normalization which is useful when in 32% (125) of the selected papers. They are gaining
the input images are on different scales, or when using ANN, more interest due to their high capacity of extracting fea-
DNN, KNN and K-means [19, 83]. For data augmentation, it tures [96]. The most used DL algorithms are VGG16,
was used by 11% (52) of the selected papers [84–87]. Data VGG19 and Resnet [25, 35, 37, 38, 45]. In fact, [97]
augmentation is very recommended when using deep learning showed that VGG19 and VGG16 outperformed other ad-
techniques since it can reduces the overfitting and gives more vanced DL techniques such as MobilNet and DensNet
samples by applying transformation to the input images [34]. over 4 datasets: BrekHis, ICIAR, Patch Camelyon and
The most used data augmentation techniques are: horizontal BIOIMAGING. In addition, [98] compared different FE
flipping, rotation (90°, 180° and 270°) and random scaling. techniques: handcrafted features, Resnet 152, Resnet 18,
For instance, many studies used data augmentation which is a Resnet X, and VGG16; the results showed that the vari-
technique that generates relevant samples since the BC tumor ants of Resnet and VGG16 outperformed the others.
8 Page 16 of 20 J Med Syst (2021) 45: 8
Figure 18 shows the distribution of the three experimental CFS with 53% of the selected papers, and this is due to its
designs in the selected papers. We conclude that: effectiveness. For instance, [111] showed that CFS improved
the accuracy of the classification and reduced computation
& Classical architecture was the most used with 74% of the time compared to other FS techniques such as Principal
selected studies. It mainly consists of using traditional Component Analysis (PCA), Gain Ratio (GR) attribute eval-
feature extraction methods such as texture features, shape uation, Chi-square Feature Evaluation, Fast Correlation-based
features, statistical features and morphological features. Feature selection (FCBF).
This is mostly due to the simplicity of the classical FE As for the wrapper methods, GA is the most used one
techniques as showed in [99–101]. with 59% of the selected studies. In fact, it is proven that
& End to End architecture was used in 21% of the selected GA techniques provided better results as shown in [105]
studies. It consists of using deep learning techniques for where GA significantly improved the performance of
FE and classification. This architecture gave better classi- SVM classifier and reduced the computation time, espe-
fication results and it is less complicated than the hybrid cially when tuning the hyperparameters. Due to the low
architecture [86, 102, 103]. usage of FS in the selected papers, we highly encourage
& Hybrid architecture was the less used with 5% of the se- researchers to use them to uplift the accuracy results, re-
lected studies. This can be explained by the fact that re- duce the model complexity, and speed up the training
searchers preferred the use of an End to End architecture to process.
reduce the number of hyperparameters to deal with, since
when using a Hybrid architecture, we tune the parameters
of the DL technique used for FE and the ML classifier Conclusion and future work
[104, 105]. However, [101, 104, 106] underlined that the
use of DL techniques for FE outperformed the classical FE The present SLR presented an overview of the use of ML and
techniques, since it allows more diversity on the use of IP in breast cancer. In fact, 530 papers published from 2000 to
ML and DL techniques by combining them and develop- August 2019 were selected and classified according to ten
ing ensemble techniques. Therefore, we highly encourage criteria: year and source of publication, research type and em-
researchers to investigate Hybrid architectures for BC pirical type, BC discipline, ML methods and techniques, val-
decision-making systems since it is a hot topic and few idation techniques and performance measures, image prepro-
researchers investigated them especially in BC field. cessing techniques, FE and FS techniques. The findings of this
SLR per RQ are:
Finally, DL techniques need a large amount of labeled
data and computational time to train from scratch which & RQ1: The use of ML and IP for BC is gaining more inter-
explains the use of transfer learning by the researchers in est in the last years by researchers, and the number of
the Hybrid and End to End architectures [107]. Medical published articles has significantly increased since 2015.
datasets are expensive, private and time consuming to Moreover, the majority of the papers where published in
collect a large amount of medical images and create labels journals (71%) which indicates a high level of maturity
with the help of professional radiologist [34, 97]; hence within the community.
researchers preferred the use of transfer learning in small & RQ2: The SLR found out that the diagnosis is the most
private dataset to benefit from the generic features extract- investigated BC task with 73%, followed by screening
ed from normal image datasets such as ImageNet [108] (17%), treatment and prognosis with 6% each.
and adapt the last layer of the DL architecture to learn & RQ3: Most of the relevant papers were identified as eval-
specific features from the selected medical datasets [34]. uation research and solution proposal with 51% and 42%
respectively. For the ER papers, they used case study
RQ10: What are the most used feature selection based evaluation with 52.67%, followed by historical
techniques? based evaluation with 47.43%. As for the SP papers ma-
jority of the articles used historical based evaluation arti-
From the 530 selected papers, only 87 papers used feature cles with 52.88%, followed by case study-based evalua-
selection techniques [82, 109, 110]. The most used ones are tion with 46.79%.
filters with 47% of the selected studies. Filters are more simple & RQ4: Classification is the most investigated objective
and faster since they are based on statistical methods that in ML and IP for BC with 89%, and this is due to the
choose the relevant features by their correlation with depen- fact that in general BC diagnosis is a classification
dent variables, while wrappers assess the usefulness of a sub- problem and the classification step is a component of
set of features by training/evaluating the ML model, and this is any IP process. Few studies dealt with prediction,
time consuming [10, 47]. The most used FS filter method is clustering and association.
J Med Syst (2021) 45: 8 Page 17 of 20 8
& RQ5: The most frequently used classification techniques Availability of data and material The list of the 530 papers including all
required information to answer the RQs of this SLR is available upon
were ANNs with 37% (176 papers) which gained more
request by email to the authors of this study.
interest over the years, followed by SVM and DTs with
26% (125) and 19% (91) respectively. For prediction, the Funding This study was funded by Mohammed VI polytechnic univer-
most used techniques were ANNs and LR with 60% (19) sity at Ben Guerir Morocco.
and 40% (13) of the papers respectively. As for clustering,
the most used techniques are ANN and K-Means with Compliance with ethical standards
29% (5) of the papers for each. For association, it was
ARM with 100% (5) of the papers. Conflicts of interest/competing interests Not applicable.
& RQ6: Public datasets are the most frequently used to eval-
Code availability not applicable.
uate ML and IP for BC (53% (262)), the most investigated
ones are MIAS 28% (74) and DDSM 24% (64).
Regarding the validation methods most of the papers used
the K-fold evaluation technique (56% (222). Finally, the References
most used performance metric is accuracy.
& RQ7: Mammographic images is the most used BC imaging 1. Z. Metelko et al., “Pergamon THE WORLD HEALTH
techniques 40% (221 papers), followed by MRI and ORGANIZATION QUALITY OF LIFE ASSESSMENT (
WHOQOL ): POSITION PAPER FROM THE WORLD
Histological images 12% (65 papers) for each, and
HEALTH ORGANIZATION,” vol. 41, no. 10, 1995.
Ultrasound images 11% (62 papers). Few studies used the 2. A. Bish, A. Ramirez, C. Burgess, and M. Hunter, “Understanding
other BC imaging techniques such as Thermography, why women delay in seeking help for breast cancer symptoms B,”
Tomosynthesis and Positron Emission Tomography (PET). vol. 58, pp 321–326, 2005.
& RQ8: Most of the studies preprocessed their input images 3. S. U. Khan, N. Islam, Z. Jan, I. U. Din, A. Khan, and Y. Faheem,
“An e-Health care services framework for the detection and clas-
by reducing noise (36% (172)), enhancing contrast (32% sification of breast cancer in breast cytology images as an IoMT
(153)), normalizing the colors (21% (100)) and using data application,” Futur. Gener. Comput. Syst., vol. 98, pp 286–296,
augmentation (11% (52)). For segmentation the most used 2019.
method is thresholding (52% (276)). 4. B. Lauby-Secretan, C. Scoccianti, and D. et al Loomis,
& RQ9: Texture features was the most used features extrac- “International Agency for Research on Cancer Handbook
Working Group. Breast-Cancer Screening–Viewpoint of the
tion technique with 40%, followed by deep learning tech- IARC Working Group,” N. Engl. J. Med., vol. 372, no. 24, pp
niques with 32% such as VGG16, VGG19 and ResNet. 2353–2358, 2015.
For the experimental design used, the classical architec- 5. M. G. Marmot, D. G. Altman, D. A. Cameron, J. A. Dewar, S. G.
ture is still the most used in 290 papers. We encourage Thompson, and M. Wilcox, “The benefits and harms of breast
cancer screening: An independent review,” Br. J. Cancer, vol.
researchers to use the two other architectures: End to End 108, no. 11, pp 2205–2240, 2013.
and Hybrid architecture in order to confirm or refute their 6. M. Tarique, F. ElZahra, A. Hateem, and M. Mohammad, “Fourier
efficiency to help dealing with BC challenges. Transform Based Early Detection of Breast Cancer by
& RQ10: Few selected papers used FS techniques (87 pa- Mammogram Image Processing,” J. Biomed. Eng. Med.
pers) and the most used FS techniques were filters (47% Imaging, vol. 2, no. 4, 2015.
7. F. Sadoughi, Z. Kazemy, F. Hamedan, L. Owji, M.
(41), followed by wrappers (32% (28)) and embedded Rahmanikatigari, and T. T. Azadboni, “Artificial intelligence
methods (21% (18)). methods for the diagnosis of breast cancer by image processing:
A review,” Breast Cancer Targets Ther., vol. 10, pp 219–230,
Ongoing work will focus on: (1) conducting a structured 2018.
8. M. Saha, R. Mukherjee, and C. Chakraborty, “Computer-aided
literature review on the performance of deep learning tech-
diagnosis of breast cancer using cytological images: A systematic
niques using image processing in BC, (2) conducting a com- review,” Tissue Cell, vol. 48, no. 5, pp 461–474, 2016.
parative study which aims at evaluating the fine-tuned deep 9. A. Al Nahid and Y. Kong, “Involvement of Machine Learning for
learning techniques using an End to End experimentation ar- Breast Cancer Image Classification: A Survey,” Comput. Math.
chitecture, and (3) developing and evaluating homogenous Methods Med., vol. 2017, no. i, 2017.
10. A. Idri, I. Chlioui, and B. El Ouassif, “A systematic map of data
and heterogenous ensembles using a Hybrid deep learning
analytics in breast cancer,” ACM Int. Conf. Proceeding Ser., 2018.
architecture. 11. M. Hosni, I. Abnane, A. Idri, J. M. Carrillo de Gea, and J. L.
Fernández Alemán, “Reviewing ensemble classification methods
Acknowledgements This work was conducted under the research project in breast cancer,” Comput. Methods Programs Biomed., vol. 177,
“Machine Learning based Breast Cancer Diagnosis and Treatment”, pp 89–112, 2019.
2020-2023. The authors would like to thank the Moroccan Ministry of 12. E. Ouassif, A. Idri, M. Hosni, and A. Abran, “Classification tech-
Higher Education and Scientific Research, Digital Development Agency niques in breast cancer diagnosis: A systematic literature review,”
(ADD), CNRST, and UM6P for their support. Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization. 2020, Inpress
8 Page 18 of 20 J Med Syst (2021) 45: 8
13. H. Zerouaoui, A. Idri, and K. El Asnaoui, “Machine learning and 31. R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard, “A survey
image processing for breast cancer: A systematic Map,” pp 1–20. on graphical methods for classification predictive performance
14. P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. evaluation,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 11, pp
Khalil, “Lessons from applying the systematic literature review 1601–1618, 2011.
process within the software engineering domain,” J. Syst. Softw., 32. N. Esfandiari, M. R. Babavalian, A. M. E. Moghadam, and V. K.
vol. 80, no. 4, pp 571–583, 2007. Tabar, “Knowledge discovery in medicine: Current issue and fu-
15. B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. ture trend,” Expert Syst. Appl., vol. 41, no. 9, pp 4434–4463, 2014.
Bailey, and S. Linkman, “Systematic literature reviews in software 33. M. I. Razzak, S. Naz, and A. Zaib, “Deep Learning for Medical
engineering - A systematic literature review,” Inf. Softw. Technol., Image Processing: Overview , Challenges and the Future.”
vol. 51, no. 1, pp 7–15, 2009. 34. D. Abdelhafiz, C. Yang, R. Ammar, and S. Nabavi, “Deep
16. A. Kofod-petersen, “How to do a structured literature review in convolutional neural networks for mammography: Advances,
computer science,” Researchgate, no. May 2015, pp 1–7, 2014. challenges and applications,” BMC Bioinformatics, vol. 20, no.
17. O. Olsen and P. C. Gøtzsche, “Cochrane review on screening for Suppl 11, 2019.
breast cancer with mammography,” Lancet, vol. 358, no. 9290, pp 35. M. A. Al-antari, M. A. Al-masni, M. T. Choi, S. M. Han, and T. S.
1340–1342, 2001. Kim, “A fully integrated computer-aided diagnosis system for
18. T. P. Carvalho, F. A. A. M. N. Soares, R. Vita, Francisco. R. da P., digital X-ray mammograms via deep learning detection, segmen-
J. P. Basto, and S. G. S. Alcalá, “A systematic literature review of tation, and classification,” Int. J. Med. Inform., vol. 117, no. June,
machine learning methods applied to predictive maintenance,” pp 44–54, 2018.
Comput. Ind. Eng., vol. 137, no. August, p. 106024, 2019. 36. G. Kumar and P. K. Bhatia, “A detailed review of feature extrac-
19. A. Idri, H. Benhar, J. L. Fernández-Alemán, and I. Kadi, “A sys- tion in image processing systems,” Int. Conf. Adv. Comput.
tematic map of medical data preprocessing in knowledge discov- Commun. Technol. ACCT, pp 5–12, 2014.
ery,” Comput. Methods Programs Biomed., vol. 162, pp 69–85, 37. H. Lin, H. Chen, S. Graham, Q. Dou, N. Rajpoot, and P.-A. Heng,
2018. “Fast ScanNet: Fast and Dense Analysis of Multi-Gigapixel
20. C. Okoli and K. Schabram, “(Okoli, Schabram 2010 Sprouts) Whole-Slide Images for Cancer Metastasis Detection,” IEEE
systematic literature reviews in IS research,” Work. Pap. Inf. Trans. Med. Imaging, vol. 38, no. 8, pp 1948–1958, 2019.
Syst., vol. 10, no. 26, pp 10–26, 2010. 38. Y. Hu, J. Li, and Z. Jiao, “Mammographic mass detection based
21. T. EL Idrissi, A. Idri, and Z. Bakkoury, “Systematic map and on saliency with deep features,” ACM Int. Conf. Proceeding Ser.,
review of predictive techniques in diabetes self-management,” vol. 19-21-Augu, pp 292–297, 2016.
Int. J. Inf. Manage., vol. 46, no. May, pp 263–277, 2019. 39. A. R. Saikia, K. Bora, L. B. Mahanta, and A. K. Das,
22. A. Rampun, H. Wang, B. Scotney, P. Morrow, and R. Zwiggelaar, “Comparative assessment of CNN architectures for classification
“School of Computing , Ulster University , Coleraine , Northern of breast FNAC images,” Tissue Cell, vol. 57, pp 8–14, 2019.
Ireland , UK Department of Computer Science , Aberystwyth 40. C. Li, X. Wang, W. Liu, L. J. Latecki, B. Wang, and J. Huang,
University , UK,” 2018 25th IEEE Int. Conf. Image Process., pp “Weakly supervised mitosis detection in breast histopathology
2072–2076, 2018. images using concentric loss,” Med. Image Anal., vol. 53, pp
23. P. Tonella, M. Torchiano, B. Du Bois, and T. Systä, “Empirical 165–178, 2019.
studies in reverse engineering: State of the art and future trends,” 41. J. Wang et al., “Detecting Cardiovascular Disease from
Empir. Softw. Eng., vol. 12, no. 5, pp 551–571, 2007. Mammograms with Deep Learning,” IEEE Trans. Med.
24. B. K. Singh, K. Verma, L. Panigrahi, and A. S. Thoke, Imaging, vol. 36, no. 5, pp 1172–1181, 2017.
“Integrating radiologist feedback with computer aided diagnostic 42. G. Carneiro, J. Nascimento, and A. P. Bradley, Deep Learning
systems for breast cancer risk prediction in ultrasonic images: An Models for Classifying Mammogram Exams Containing
experimental investigation in machine learning paradigm,” Expert Unregistered Multi-View Images and Segmentation Maps of
Syst. Appl., vol. 90, pp 209–223, 2017. Lesions, 1st ed. Elsevier Inc., 2017.
25. K. Mendel, H. Li, D. Sheth, and M. Giger, “Transfer Learning 43. N. Dhungel, G. Carneiro, and A. P. Bradley, “Deep learning and
From Convolutional Neural Networks for Computer-Aided structured prediction for the segmentation of mass in mammo-
Diagnosis: A Comparison of Digital Breast Tomosynthesis and grams,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes
Full-Field Digital Mammography,” Acad. Radiol., vol. 26, no. 6, Artif. Intell. Lect. Notes Bioinformatics), vol. 9349, pp 605–612,
pp 735–743, 2019. 2015.
26. M. V Valueva, N. N. Nagornov, P. A. Lyakhov, G. V Valuev, and 44. A. Rodríguez-Cristerna, W. Gómez-Flores, and W. C. de
N. I. Chervyakov, “ScienceDirect Application of the residue num- Albuquerque Pereira, “A computer-aided diagnosis system for
ber system to reduce hardware costs of the convolutional neural breast ultrasound based on weighted BI-RADS classes,”
network implementation,” Math. Comput. Simul., vol. 177, pp Comput. Methods Programs Biomed., vol. 153, pp 33–40, 2018.
232–243, 2020. 45. P. Herent et al., “Detection and characterization of MRI breast
27. Z. Zhang, M. Lyons, M. Schuster, S. Akamatsu, and F.-S. Cedex, lesions using deep learning,” Diagn. Interv. Imaging, vol. 100,
“Comparison Between Geometry-Based and Gabor-Wavelets- no. 4, pp 219–225, 2019.
Based Facial Expression Recognition Using Multi-Layer 46. R. K. Samala, H. P. Chan, L. Hadjiiski, M. A. Helvie, C. D.
Perceptron,” 2004. Richter, and K. H. Cha, “Breast cancer diagnosis in digital breast
28. W. Train, D. Architectures, I. Representations, and S. Features, tomosynthesis: Effects of training sample size on multi-stage
Learning Deep Architectures for AI By Yoshua Bengio, vol. 2, no. transfer learning using deep neural nets,” IEEE Trans. Med.
1. 2009. Imaging, vol. 38, no. 3, pp 686–696, 2019.
29. T. Cogan, M. Cogan, and L. Tamil, “RAMS: Remote and auto- 47. J. R. Vergara and P. A. Estévez, “A review of feature selection
matic mammogram screening,” Comput. Biol. Med., vol. 107, pp methods based on mutual information,” Neural Comput. Appl.,
18–29, 2019. vol. 24, no. 1, pp 175–186, 2014.
30. S. Guan and M. Loew, “Breast cancer detection using transfer 48. B. K. Singh, K. Verma, and A. S. Thoke, “Fuzzy cluster based
learning in convolutional neural networks,” Proc. - Appl. Imag. neural network classifier for classifying breast tumors in ultra-
Pattern Recognit. Work., vol. 2017-Octob, pp 1–8, 2018. sound images,” Expert Syst. Appl., vol. 66, pp 114–123, 2016.
J Med Syst (2021) 45: 8 Page 19 of 20 8
49. T. K. Avramov, W. Bothell, D. Si, and W. Bothell, “Comparison 67. S. P. Ngayarkanni, N. B. Kamal, and V. Thavavel, “Automatic
of Feature Reduction Methods and Machine Learning Models for detection and classification of cancerous masses in mammogram,”
Breast Cancer Diagnosis,” pp 69–74, 2017. 2012 3rd Int. Conf. Comput. Commun. Netw. Technol. ICCCNT
50. A. F. M. Agarap, “On breast cancer detection: An application of 2012, no. July, 2012.
machine learning algorithms on the Wisconsin diagnostic dataset, 68. B. Bektas, I. E. Emre, E. Kartal, and S. Gulsecen, “Classification
” ACM Int. Conf. Proceeding Ser., no. 1, pp 5–9, 2018. of Mammography Images by Machine Learning Techniques,”
51. M. Ezzat and A. Idri, Reviewing Data Analytics Techniques in UBMK 2018 - 3rd Int. Conf. Comput. Sci. Eng., pp 580–585,
Breast Cancer Treatment, vol. 2. Springer International 2018.
Publishing, 2020. 69. H. Chougrad, H. Zouaki, and O. Alheyane, “Multi-label transfer
52. X. Xiong, Y. Kim, Y. Baek, D. W. Rhee, and S. H. Kim, learning for the early diagnosis o f breas t cancer, ”
“Analysis of breast cancer using data mining & statistical tech- Neurocomputing, no. xxxx, 2019.
niques,” Proc. - Sixth Int. Conf. Softw. Eng., Artif. Intell. Netw. 70. A. Ghosh, “Artificial Intelligence Using Open Source BI-RADS
Parallel/Distributed Comput. First ACIS Int. Work. Self- Data Exemplifying Potential Future Use,” J. Am. Coll. Radiol.,
Assembling Wirel. Netw., SNPD/SAWN 2005, vol. 2005, pp 82– vol. 16, no. 1, pp 64–72, 2019.
87, 2005. 71. B. K. Singh, “Determining relevant biomarkers for prediction of
53. A. C. Patrocinio and H. Schiabel, “Classifying clusters of breast cancer using anthropometric and clinical features: A com-
microcalcifications in digitized mammograms by artificial neural parative investigation in machine learning paradigm,” Biocybern.
network,” Brazilian Symp. Comput. Graph. Image Process., vol. Biomed. Eng., vol. 39, no. 2, pp 393–409, 2019.
2001-Janua, pp 266–272, 2001. 72. Q. Zhang, S. Song, Y. Xiao, S. Chen, J. Shi, and H. Zheng, “Dual-
54. N. Bayramoglu, J. Kannala, and J. Heikkila, “Deep learning for mode artificially-intelligent diagnosis of breast tumours in shear-
magnification independent breast cancer histopathology image wave elastography and B-mode ultrasound using deep polynomial
classification,” Proc. - Int. Conf. Pattern Recognit., vol. 0, pp networks,” Med. Eng. Phys., vol. 64, no. xxxx, pp 1–6, 2019.
2440–2445, 2016. 73. M. G. Mini, “Neural network based classification of digitized
55. Z. Hu, J. Tang, Z. Wang, K. Zhang, L. Zhang, and Q. Sun, “Deep mammograms,” Proc. 2nd Kuwait Conf. e-Services e-Systems,
learning for image-based cancer detection and diagnosis − A sur- KCESS’11, pp 1–5, 2011.
vey,” Pattern Recognit., vol. 83, pp 134–149, 2018. 74. C. Ferri and R. Modroiu, “An experimental comparison of perfor-
56. B. Gerazov and R. C. Conceicao, “Deep learning for tumour clas- mance measures for classification,” Pattern Recognit. Lett., vol.
sification in homogeneous breast tissue in medical microwave 30, no. 1, pp 27–38, 2009.
imaging,” 17th IEEE Int. Conf. Smart Technol. EUROCON 75. J. Davis and M. Goadrich, “The Relationship Between Precision-
2017 - Conf. Proc., no. July, pp 564–569, 2017.
Recall and ROC Curves,” 2006.
57. M. Bahl, R. Barzilay, A. Yedidia, N. Locascio, L. Yu, and C.
76. C. Muramatsu, T. Hara, T. Endo, and H. Fujita, “Breast mass
Lehman, “BREAST IMAGING: Prediction of Pathologic
classification on mammograms using radial local ternary pat-
Upgrade in High-Risk Breast Lesions Bahl et al,” Radiology,
terns,” Comput. Biol. Med., vol. 72, pp 43–53, 2016.
vol. 000, no. 0, pp 1–9, 2017.
77. Y. Gao, K. J. Geras, A. A. Lewin, and L. Moy, “New frontiers: An
58. P. Mitra, C. A. Murthy, and S. K. Pal, “Unsupervised feature
update on computer-aided diagnosis for breast imaging in the age
selection using feature similarity,” IEEE Trans. Pattern Anal.
of artificial intelligence,” Am. J. Roentgenol., vol. 212, no. 2, pp
Mach. Intell., vol. 24, no. 3, pp 301–312, 2002.
300–307, 2019.
59. S. J. S. Gardezi, A. Elazab, B. Lei, and T. Wang, “Breast cancer
detection and diagnosis using mammographic data: Systematic 78. R. A. Übersichtsarbeit, S. H. H. Astrid, and H. Stefan, “Breast
Care Advantages and Disadvantages of Mammography
review,” J. Med. Internet Res., vol. 21, no. 7, pp 1–22, 2019.
60. D. N. Ponraj, M. E. Jenifer, and J. S. Manoharan, “D.Narain Screening,” pp 199–207, 2011.
Ponraj, M.Evangelin Jenifer, P. Poongodi, J.Samuel 79. R. M. Rangayyan, F. J. Ayres, and J. E. Leo Desautels, “A review
Manoharan.pdf,” vol. 2, no. 12, pp 656–664, 2011. of computer-aided diagnosis of breast cancer: Toward the detec-
61. C. Shen, Y. Gonzalez, L. Chen, S. B. Jiang, and X. Jia, “Intelligent tion of subtle signs,” J. Franklin Inst., vol. 344, no. 3–4, pp 312–
Parameter Tuning in Optimization-Based Iterative CT 348, 2007.
Reconstruction via Deep Reinforcement Learning,” IEEE Trans. 80. N. Dehghan Khalilabad and H. Hassanpour, “Employing image
Med. Imaging, vol. 37, no. 6, pp 1430–1439, 2018. processing techniques for cancer detection using microarray im-
62. V. S. Kumar and D. Boulanger, “Automated Essay Scoring and ages,” Comput. Biol. Med., vol. 81, pp 139–147, 2017.
the Deep Learning Black Box: How Are Rubric Scores 81. I. Fondón et al., “Automatic classification of tissue malignancy for
Determined?,” Int. J. Artif. Intell. Educ., 2020. breast carcinoma diagnosis,” Comput. Biol. Med., vol. 96, pp 41–
63. M. X. Ribeiro, A. J. M. Traina, C. Traina, and P. M. Azevedo- 51, 2018.
Marques, “An association rule-based method to support medical 82. S. K. Wajid, A. Hussain, and K. Huang, “Three-Dimensional
image diagnosis with efficiency,” IEEE Trans. Multimed., vol. 10, Local Energy-Based Shape Histogram (3D-LESH): A Novel
no. 2, pp 277–285, 2008. Feature Extraction Technique,” Expert Syst. Appl., vol. 112, pp
64. Y. Jiang, Z. Li, Y. Wang, and L. Zhang, “Joining associative 388–400, 2018.
classifier for medical images,” Proc. - HIS 2005 Fifth Int. Conf. 83. A. Idri et al., “PT US CR,” 2018.
Hybrid Intell. Syst., vol. 2005, no. 60373108, pp 367–372, 2005. 84. J. Y. Yeh and S. Chan, “CNN-based CAD for breast cancer clas-
65. Y. Shachor and J. Goldberger, “A MIXTURE OF VIEWS sification in digital breast tomosynthesis,” ACM Int. Conf.
NETWORK WITH APPLICATIONS TO THE Proceeding Ser., pp 26–30, 2018.
CLASSIFICATION OF Hayit Greenspan Faculty of 85. Y. Jiang, L. Chen, H. Zhang, and X. Xiao, “Breast cancer histo-
Engineering , Bar-Ilan University , Ramat-Gan , Israel pathological image classification using convolutional neural net-
Department of Biomedical Engineering , Tel Aviv University , works with small SE-ResNet module,” PLoS One, vol. 14, no. 3,
Tel Aviv , Israel,” 2019 IEEE 16th Int. Symp. Biomed. Imaging pp 1–21, 2019.
(ISBI 2019), no. Isbi, pp 1065–1069, 2019. 86. Shallu and R. Mehra, “Breast cancer histology images classifica-
66. T. Nadu and T. Nadu, “MRI MAMMOGRAM IMAGE tion: Training from scratch or transfer learning?,” ICT Express,
CLASSIFICATION USING ID3 ALGORITHM,” pp 1–5, 2003. vol. 4, no. 4, pp 247–254, 2018.
8 Page 20 of 20 J Med Syst (2021) 45: 8
87. R. Almajalid, J. Shan, Y. Du, and M. Zhang, “Development of a 101. P. Alirezazadeh, B. Hejrati, A. Monsef-Esfahani, and A. Fathi,
Deep-Learning-Based Method for Breast Ultrasound Image “Representation learning-based unsupervised domain adaptation
Segmentation,” Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. for classification of breast cancer histopathology images,”
ICMLA 2018, pp 1103–1108, 2019. Biocybern. Biomed. Eng., vol. 38, no. 3, pp 671–683, 2018.
88. K. Xiao, Z. Wang, T. Xu, and T. Wan, “a Deep Learning Method 102. D. Kumar, C. Kumar, and M. Shao, “Cross-database mammo-
for Detecting and Classifying Breast Cancer Metastases in Lymph graphic image analysis through unsupervised domain adaptation,
Nodes on Histopathological Images,” pp 1–5, 2017. ” Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, vol.
89. J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” 2018-Janua, pp 4035–4042, 2018.
Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 103. E. Stoffel et al., “Distinction between phyllodes tumor and
2017, vol. 2017-Janua, pp 6517–6525, 2017. fibroadenoma in breast ultrasound using deep learning image
90. M. A. Al-Masni et al., “Detection and classification of the breast analysis,” Eur. J. Radiol. Open, vol. 5, no. March, pp 165–170,
abnormalities in digital mammograms via regional Convolutional 2018.
Neural Network,” Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. 104. R. Yan et al., “A Hybrid Convolutional and Recurrent Deep
Soc. EMBS, pp 1230–1233, 2017. Neural Network for Breast Cancer Pathological Image
91. M. A. Al-masni et al., “Simultaneous detection and classification Classification,” Proc. - 2018 IEEE Int. Conf. Bioinforma.
of breast masses in digital mammograms via a deep learning Biomed. BIBM 2018, pp 957–962, 2019.
YOLO-based CAD system,” Comput. Methods Programs 105. L. Zhuo, J. Zheng, X. Li, F. Wang, B. Ai, and J. Qian, “A Genetic
Biomed., vol. 157, pp 85–94, 2018. Algorithm Based Wrapper Feature Selection Method for
92. A. Humeau-Heurtier, “Texture feature extraction methods: A sur- Classification of Hyperspectral Images Using Support Vector
vey,” IEEE Access, vol. 7, pp 8975–9000, 2019. Machine,” vol. 7147, pp 1–9, 2008.
93. A. Manduca et al., “Texture Features from Mammographic
106. E. Deniz, A. Şengür, Z. Kadiroğlu, Y. Guo, V. Bajaj, and Ü.
Images and Risk of Breast Cancer,” vol. 18, no. March, pp 837–
Budak, “Transfer learning based histopathologic image classifica-
846, 2009.
tion for breast cancer detection,” Heal. Inf. Sci. Syst., vol. 6, no. 1,
94. D. V Fried et al., “Prognostic Value and Reproducibility of
2018.
Pretreatment CT Texture Features in Stage III Non-Small Cell
Lung Cancer,” Radiat. Oncol. Biol., vol. 90, no. 4, pp 834–842, 107. M. A. Hedjazi, I. Kourbane, and Y. Genc, “Yaprak Sınıflandırma
2014. Üzerine: Derin Ö ˘ grenme ve Geleneksel Makine Ö ˘ grenme
95. M. Sasikala and N. Kumaravel, “A wavelet-based optimal texture Yöntemlerinin Kar ¸ sıla ¸ stırılması On Identifying Leaves: A
feature set for classification of brain tumours,” vol. 32, no. 3, pp Comparison of CNN with Classical ML Methods,” pp 0–3, 2017.
198–205, 2008. 108. O. Russakovsky et al., “ImageNet Large Scale Visual Recognition
96. S. Dara and P. Tumma, “Feature Extraction by Using Deep Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp 211–252,
Learning: A Survey,” Proc. 2nd Int. Conf. Electron. Commun. 2015.
Aerosp. Technol. ICECA 2018, no. Iceca, pp 1795–1801, 2018. 109. M. Muštra, M. Grgić, and K. Delač, “Breast density classification
97. S. H. Kassani, P. H. Kassani, M. J. Wesolowski, K. A. Schneider, using multiple feature selection | klasifikacija dojki prema gustoći
and R. Deters, “Classification of Histopathological Biopsy Images izborom značajki,” Automatika, vol. 53, no. 4, pp 362–372, 2012.
Using Ensemble of Deep Learning Networks,” 2019. 110. Q. Xiong et al., “Multiparametric MRI-based radiomics analysis
98. Z. Cao, L. Duan, G. Yang, T. Yue, and Q. Chen, “An experimen- for prediction of breast cancers insensitive to neoadjuvant chemo-
tal study on breast lesion detection and classification from ultra- therapy,” Clin. Transl. Oncol., no. 0123456789, 2019.
sound images using deep learning architectures,” BMC Med. 111. A. G. Karegowda, A. S. Manjunath, G. Ratio, and C. F.
Imaging, vol. 19, no. 1, pp 1–9, 2019. Evaluation, “COMPARATIVE STUDY OF ATTRIBUTE
99. J. Torrents-Barrena, D. Puig, J. Melendez, and A. Valls, SELECTION USING GAIN RATIO,” vol. 2, no. 2, pp 271–
“Computer-aided diagnosis of breast cancer via Gabor wavelet 277, 2010.
bank and binary-class SVM in mammographic images,” J. Exp.
Theor. Artif. Intell., vol. 28, no. 1–2, pp 295–311, 2016. Publisher’s Note Springer Nature remains neutral with regard to jurisdic-
100. P. Yadav and V. Jethani, “Breast thermograms analysisfor cancer tional claims in published maps and institutional affiliations.
detection using feature extraction and data mining technique,”
ACM Int. Conf. Proceeding Ser., vol. 12-13-Augu, 2016.