2019 IEEE 11th International Conference on Advanced Infocomm Technology
A Deep Convolutional Neural Network Learning Transfer to SVM-Based
Segmentation Method for Brain Tumor
Bin Cui1,2, Mingchao Xie1,2, Chunxing Wang1,*
1
School of Physics and Electronics, Shandong Normal University, Jinan 250014, China
2
These authors contributed equally to this work
*
e-mail:
[email protected] Abstract—Brain tumor segmentation plays an important role in brain symmetry [7], [8], [12], and physical properties [12].
assisting diagnosis, planning treatment and surgical navigation. Meier et al. [16] used a semi-supervised RF to train a
In this paper, we propose a convolutional neural network- subject-specific classifier for post-operative brain tumor
based learning transfer to support vector machine classifier segmentation.
method for brain tumor segmentation. Our algorithm is In the area of brain tumor segmentation, a number of
composed of two cascaded stages. In the first stage, we trained recent proposals are still investigated the use of CNNs [17]-
CNN to learn the mapping from the image space to the tumor [23]. Zikic et al. [17] used a shallow CNN to classify, which
label space. During the testing phase, we used the predicted has two convolution layers by max-pooling with step 3,
label output from CNN and sent it along with the testing image
along with a fully-connected (FC) layer and a softmax layer.
to an SVM classifier for accurate segmentation. Then we
iterate our deep CNN-SVM classifier. Experiments and
Urban et al. [18] assessed the use of a 3D filter [19]-[23]. 3D
comparisons demonstrate that the proposed framework is filters take the advantage of 3D features, but it increases the
better than the separate SVM-based segmentation or CNN- burden of computing. Lyksborg et al. [21] used a binary
based segmentation. CNN to identify the full tumor, and then the cellular
automata is used to smooth the division before a multi-class
Keywords-brain tumor segmentation; convolutional neural CNN identified the sub-region of the tumor. Dvorak and
networks; transfer learning; support vector machine Menze [23] divided the brain tumor regions segmentation
task into binary sub-tasks, and presented CNN as a learning
I. INTRODUCTION method with a structured prediction. However, for the brain
MRI has played an important role in the detection and tumor images, these methods of feature extraction are
treatment of brain tumors [1]. Due to the high proliferation predetermined according to subjective experience. They do
and increment rate of tumors, as well as their appearance not work well when considering the specificity of brain
variability, the complete tumor segmentation including tumor size, shape and gray scale.
compartments (such as the core necrosis, edema) all have In this paper, we propose a fully automatic segmentation
become more challenging [2]. method based on convolution neural network algorithm and
Recently, the segmentation methods based on SVM. More specifically, we propose to use the convolution
convolution neural network have demonstrated good neural network learning transfer to SVM classifier to
performance and robustness. But convolutional neural segment high-grade tumors. Our framework is composed of
networks are widely used in high-grade gliomas, so many two cascaded phases. In the first stage, we trained CNN to
recent articles still use convolutional neural networks for use the MRI image gray-scale feature to learn the mapping
segmentation. from image space to tumor label space. During the testing
The authors propose some architectures based on a phase, we used the predicted label output from CNN and sent
combination of modified versions of CNN to extract local it to a SVM classifier together with the testing gray-scale
and large context features [3],[4]. In addition, some image to get precise segmentation. To make the results more
independent pixels or small clusters may be misclassified precise, we deepened the classification process and iterated
into the wrong class. In order to solve this problem, some these two steps, using the CNN-SVM cascaded into the next
authors include the neighborhood information into the cascade.
traditional classifier by planting the probability of the
classifier [5]-[8]. For example, support vector machines II. MATERIALS AND METHODS
[5],[6] and random forest [7]-[14] and other classifiers have A. Convolutional Neural Networks
been successfully applied to brain tumor segmentation. Convolutional neural networks consisting of input layer,
Random forest becomes very useful because of its natural hidden layer and output layer are essentially variations of
performance in handling multi-class problems and large multi-layer perceptrons. The hidden layers have multiple
number of feature vectors. A variety of features were layers, each consisting of multiple two-dimensional planes
proposed in the literature: encoding context [8], [9], [11], containing multiple neurons. And the hidden layer consists of
first-order and fractals-based texture[7],[11], [15], gradients the feature extraction layer and the sub-sampling layer.
978-1-7281-4778-9/19/$31.00 ©2019 IEEE 1
Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on April 17,2023 at 04:23:51 UTC from IEEE Xplore. Restrictions apply.
1) Pre-processin In the case of nonlinear classification, a kernel function
MRI images are altered by the bias field distortion. This (Polynomial, Gaussian Radial Basis Function, Sigmiod
makes the intensity of the same tissues to vary across the Function) is used to transform the feature space to a Higher-
image. To correct it, we applied the recently popular N4ITK dimension space where a linear separation is possible [31].
method [24]. Given a training data set, to find the maximum-margin
2) Activation function hyperplane new data points are classified according to their
Rectifier linear units (ReLU), defined as: distance from the hyperplane. In general, classification
interval will change with the change of the classification
f ( x) = max( 0, x) (1)
plane, and the size of the classification interval will affect the
were found to achieve better results than the more classical VC dimension of the classifier.
sigmoid, or hyperbolic tangent functions [25], [26], [34].
However, it may cause the cell to not be activated again on C. Deep CNN Learning Transfer to SVM
any subsequent data nodes, when a large gradient flow In our paper, we propose a brain tumor segmentation
passes through the ReLU unit. If it happens, the gradient of algorithm based on convolution neural network and support
vector machine. As shown in Fig.1, the structure of the
the unit will always be zero. In other words, ReLU units
proposed framework consists of two main phases: the first is
may not be reversible in the training data stream. Therefore, preprocessing, feature extraction, training CNN and SVM;
we used a variant called leaky rectifier linear unit (LReLU) the second is testing and generating the segmentation results.
[27] that introduces a small slope on the negative part of the In the first stage, we train the convolution neural network
function. This function is defined as: and the support vector machine to learn the mapping from
f ( x) = max( 0, x) + min( 0, x) (2) the gray-scale image domain to the tumor label domain.
where is the leakiness parameter. When using CNN alone, During the testing phase, we send the label output of the
convolution neural network and the testing gray image to the
we used the softmax classifier for classification. Softmax is SVM classifier to generate accurate segmentation. Then, we
a multi-class classifier, and we use it as a two-class deepen our CNN-SVM cascaded classification steps through
classifier. an iterative step.
3) Pooling
After we obtained features by convolution, the
calculation will be very large and be easy to overfitting if we
train with all the extracted features. To correct it, we can
calculate the maximum of a particular feature of a region.
This operation is called max-pooling [28].
4) Regularization
It is used to reduce overfitting. We use Dropout [29] in
the FC layers. We reset the output of nodes in the hidden
layer with to reduce the dependency between the nodes. This
is an effective way to reduce overfitting.
5) Loss function
In short, the loss function is a function to measure the
error of the training model. In general, we always minimize
the function. We used the negative log-likelihood loss
function what is defined as:
D
NLL( , D) = − logP(Y = y (i ) x (i ) , ) (3)
i =0 Figure 1. The structure of the proposed framework.
6) Training
To generate a meaningful and discriminating feature for
To train the CNN the loss function must be minimized,
CNN training, we add a simple intermediate processing step,
but it is highly non-linear. We use Stochastic Gradient
as shown in Fig. 2.
Descent as an optimization algorithm.
B. Support Vector Machine
Support vector machine is a supervised learning
technique firstly introduced by Vladimir Vapinik [30], which
is based on the theory of VC dimension and the minimized
structural risk. SVM can obtain the small practical risks and
have superior promotion ability by seeking the best Figure 2. Intermediate processing step.
compromise between the complexity and learning ability of
the model.
Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on April 17,2023 at 04:23:51 UTC from IEEE Xplore. Restrictions apply.
Next, during the test phase, we use the aggregated CNN The convolution layer is used to obtain the image
label map and the same previous features to independently features. The max-pooling layer sub-samples the obtained
train an SVM classifier. feature images to reduce the amount of data processing while
After giving the pre-processed input image, we use an preserving useful feature information. The full-connected
iterative classification process. First, CNN classifies the pixel layer is used to capture the complex relationship between
in the most prominent position and produce a priority output characteristics. Softmax is a multi-class classifier,
segmentation that will be forwarded to the SVM classifier. which is used for two classes of classifiers, whose output is a
Second, in addition to priority segmentation, SVM-based conditional probability value between 0 and 1. The activation
classification will be performed on a ROI(region of interest) function in our architecture is LReLU, which can impair the
which is generate in the previous step. After that, SVM gradient flowing and consequent adjustment, of the weights
explores the neighborhood of output which is generate in the [33]. The Stochastic Gradient Descent method is used to
CNN. Then, we will reclassify the labeled ROI from CNN. minimize the loss function to obtain the network model. The
Finally, these two steps will repeat the iteration to further data block size of input image in our network is 17×17.
refine the segmentation results. Extracted some features of tumor edge features and texture
In general, learning transfer is not only a simple features on the network, the low-level feature is transformed
communication between two classifiers, but the knowledge into a high-level feature after multi-layer learning. Finally,
what is learned from one classifier to help the learning task we will input the high-level feature into the SVM classifier
in another classification. after the intermediate pre-processing. Classified each pixel in
the image into two classes, we will obtain the probability
D. Evaluation of the Segmentation
value that the pixel belongs to the tumor or the normal tissue
1) Dice Similarity Coefficient(DSC) of the brain. According to the classification of the probability
DSC [32] is a measure of the repetition rate between value of the two classes, we will obtain the binary image of
manual segmentation and automatic segmentation. It is the tumor.
defined as:
2TP B. Parameter Setting for SVM
DSC = (4)
FP + 2TP + FN The kernel function used by the support vector machine
where TP (true positive), FP (false positive) and FN (false in this paper is Gaussian Radial Basis Function, which is
negative) are the number of tumor points that detecting defined as:
2
results are true positive, false positive and false negative. x − xi
2) Positive Predictive Value(PPV) K ( x, xi ) = exp{− } (7)
PPV is the proportion of tumor points for the 2
segmentation of the correct tumor points. It is defined as: where is Radial base width. The SVM is a radial basis
function classifier. And the classifier is different from the
TP (5) traditional radial basis function method, whose weight is
PPV =
TP + FP automatically determined.
The classification accuracy of the samples will decrease
3) Sensitivity with the increase of window size s for SVM. The
Sensitivity is the proportion of the true tumor points of the segmentation results of support vector machines of different
correct tumor point. It is defined as: kernel functions are not different in the same window size.
TP (6) However, the results of the SVM method used Polynomial
Sensitivity =
TP + FN kernel are slightly lower than the results used linear kernel or
Radial Basis kernel. In order to obtain the best segmentation
III. RESULTS results used Radial Basis kernel function, the selection of the
A. Architecture of Deep Convolutional Neural Networks penalty factor C and 2 should not be too small or too
The structure model of convolution neural network large. Hence, we set C was 1000, 2 was 0.01, the
designed in this paper is shown in Fig.3. The network window size was 5×5, and the number of training sample
structure includes the convolution layer, the max-pooling was about 2000.
layer, the fully-connected layer and a softmax layer.
Figure 3. A structural model using CNN solely for segmentation.
Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on April 17,2023 at 04:23:51 UTC from IEEE Xplore. Restrictions apply.
C. Segmentation Results
To verify the segmentation results of CNN-SVM
structure, this paper uses the clinical MRI images of 30
patients, which is used to train and segment the brain
tumor images. Meanwhile, we are separately used the
CNN and the SVM to segment the same MRI image. And
we compare the three segmentation results which is shown
in TABLE Ⅰ、Ⅱ.
Fig. 4 shows the number of mis-classified pixels of our
algorithm under different iteration times. And in our paper,
we set N to 4. TABLE Ⅰ shown the DSC、PPV and
Sensitivity of the three methods. Compared to the
independent CNN and SVM, and all three parameters of
our proposed CNN-SVM structure have increased.
Specifically, the DSC value of the method we proposed is Figure 4. Number of mis-classified pixels variation.
about 0.88 compared with CNN(0.79) and SVM(0.70).
The PPV is about 0.83 compared with CNN(0.79) and TABLE I. SEGMENTATION RESULTS EVALUATION USING DSC,
SVM(0.78), and the Sensitivity is about 0.89 compared PPV AND SENSITIVITY
with CNN(0.81) and SVM(0.83). In general, our proposed Method CNN-SVM CNN SVM
framework significantly outperformed CNN-based
segmentation and SVM-based segmentation—when used DSC 0.88±0.04 0.79±0.09 0.70±0.11
solely. PPV 0.83±0.04 0.79±0.10 0.78±0.06
TABLE Ⅱ provides some qualitative results. In
Sensitivity 0.89±0.02 0.81±0.11 0.83±0.12
addition, our method gives better results for fine details of
the tumor boundary.
TABLE II. THE COMPARISON OF SEGMENTATION RESULTS EVALUATION USING DSC, PPV AND SENSITIVITY
Figure 5. Exemplary segmentation result.
Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on April 17,2023 at 04:23:51 UTC from IEEE Xplore. Restrictions apply.
[13] A. Pinto et al. Brain tumour segmentation based on extremely
IV. CONCLUSIONS randomized forest with high-level features: Proc. 37th Annu. Int.
In this paper, we propose a simple but effective method Conf. IEEE EMBC, 2015; pp. 3037–3040.
for MRI brain tumor segmentation through transferring a [14] A. Islam, S. Reza, and K. M. Iftekharuddin. Multifractal texture
estimation for detection and segmentation of brain tumors: IEEE
learnt convolutional neural networks-based knowledge to Trans. Biomed. Eng., Nov. 2013; vol. 60, no. 11,pp. 3204–3215.
guide an SVM classifier in the segmentation task. The
[15] R. Meier et al. Patient-specific semi-supervised learning for
proposed method has an acceptable performance compared postoperative brain tumor segmentation: Medical Image Computing
to other approaches. And it also significantly outperformed and Comput.-Assisted Intervention-MICCAI 2014.;
the mere use of CNN or SVM for tumor classification. In Publisher:Springer, New York, 2014; pp. 714–721.
the further study, we will investigate the performance of the [16] D. Zikic et al. Segmentation of brain tumor tissues with
convolutional neural networks when combined with other convolutional neural networks: MICCAI Multimodal Brain Tumor
robust classifiers, to increase the accuracy of the MRI brain Segmentation Challenge (BraTS), 2014; pp. 36–39.
tumor segmentation, which can benefit the development of [17] G. Urban et al. Multi-modal brain tumor segmentation using deep
convolutional neural networks: MICCAI Multimodal Brain Tumor
the automatic tumor segmentation in the future. Segmentation Challenge (BraTS), 2014; pp. 1–5.
[18] A. Davy et al. Brain tumor segmentation with deep neural networks:
REFERENCES MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS),
[1] Narayanan, V. High Grade Gliomas: Pathogenesis, Management. 2014; pp. 31–35.
[2] I Despotovi, B Goossens and W Philips. MRI Segmentation of the [19] A. Davy et al. Brain tumor segmentation with deep neural networks:
Human Brain: Challenges, Methods, and Applications. MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS),
Computational and mathematical methods in medicine, 2015. 2014; pp. 31–35.
[3] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, [20] M. Havaei et al. Brain tumor segmentation with deep neural
Y.Bengio, Chris Pal, P.M. Jodoin and H. Larochelle. Brain tumor networks. Available online: http://arxiv.org/abs /1505.03540, ArXiv:
segmentation with deep neural networks. arXiv preprint 1505.03540v1(2015).
arXiv:1505.03540, 2015. [21] M. Lyksborg et al. An ensemble of 2d convolutional neural
[4] K. Kamnitsas, L. Chen, C. Ledig, D. Rueckert and B. Glocker. networks for tumor segmentation: Image Analysis.;
Multiscale 3d convolutional neural networks for lesion segmentation Publisher:Springer, New York,2015; pp. 201–211.
in brain mri: Ischemic Stroke Lesion Segmentation, 2015; pp. 13-19. [22] V. Rao, M. Sharifi, and A. Jaiswal. Brain tumor segmentation with
[5] S. Bauer, L.-P. Nolte, and M. Reyes. Fully automatic segmentation deep learning: MICCAI Multimodal Brain Tumor Segmentation
of brain tumor images using support vector machine classification in Challenge (BraTS), 2015; pp. 56–59.
combination with hierarchical conditional random field [23] P. Dvorák and B. Menze. Structured prediction with convolutional
regularization: Medical Image Computing and Comput.-Assisted neural networks for multimodal brain tumor segmentation: MICCAI
Intervention-MICCAI 2011; Publisher:Springer, New York, 2011; Multimodal Brain Tumor Segmentation Challenge (BraTS), 2015;
pp.28-33. pp. 13–24.
[6] C.-H. Lee et al. Segmenting brain tumors using pseudo-conditional [24] N. J. Tustison et al. N4ITK: Improved n3 bias correction: IEEE
random fields: Medical Image Computing and Comput.-Assisted Trans. Med. Imag., Jun. 2010; vol. 29, no. 6,pp. 1310–1320.
Intervention-MICCAI 2008.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
[7] R. Meier et al. A hybrid model for multimodal brain tumor classification with deep convolutional neural networks: Adv. Neural
segmentation: Proc. NCI-MICCAI BRATS, 2013; pp. 31–37. Inform. Process. Syst., 2012; pp. 1097–1105.
R. Meier et al. Appearance-and context-sensitive features for brain [26] K. Jarrett et al.What is the best multi-stage architecture for object
tumor segmentation: MICCAI Brain Tumor Segmentation Challenge recognition?: Proc. 12th Int. Conf. IEEE Comput. Vis., 2009; pp.
(BraTS), 2014; pp. 20–26. 2146–2153.
[8] D. Zikic et al. Decision forests for tissue-specific segmentation of [27] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities
high-grade gliomas in multi-channel MR: Medical Image improve neural network acoustic models: Proc. ICML, 2013; vol. 30.
Computing and Comput.-Assisted Intervention-MICCAI 2012.;
Publisher:Springer, New York, 2012; pp. 369–376. [28] Y. LeCun, Y. Bengio. and G. Hinton, Deep learning: Nature, vol.
521, 2015;no. 7553, pp. 436–444.
[9] S. Bauer et al. Segmentation of brain tumor images based on
integrated hierarchical classification and regularization: Proc. [29] N. Srivastava et al. Dropout: A simple way to prevent neural
MICCAIBRATS, 2012; pp. 10–13. networks from overfitting: J. Mach. Learn. Res., 2014;vol. 15, no. 1,
pp. 1929–1958.
[10] S. Reza and K. Iftekharuddin. Multi-fractal texture features for brain
tumor and edema segmentation: SPIE Med. Imag. Int. Soc. Opt. [30] C. Cortes and V. Vapnik. Support-vector networks: Machine
Photon., 2014; pp. 903503–903503. learning, 1995; 20(3), 273-297.
[11] N. Tustison et al. Optimal symmetric multimodal templates and [31] C. J. Burges. A tutorial on support vector machines for pattern
concatenated random forests for supervised brain tumor recognition. Data mining and knowledge discovery, 1998; 2(2), 121-
segmentation (simplified) with ANTsR: Neuroinformatics, 2015; vol. 167.
13, no. 2, pp. 209–225. [32] L. R. Dice. Measures of the amount of ecologic association between
[12] E. Geremia, B. H. Menze, and N. Ayache. Spatially adaptive random species: Ecology, 1945; vol. 26, no. 3, pp. 297–302.
forests: Proc. IEEE 10th Int. Symp. Biomed. Imag., 2013; pp. 1344– [33] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities
1347. improve neural network acoustic models: Proc. ICML, 2013; vol.30.
Authorized licensed use limited to: C. V. Raman Global University - Bhubaneswar. Downloaded on April 17,2023 at 04:23:51 UTC from IEEE Xplore. Restrictions apply.