0% found this document useful (0 votes)
81 views5 pages

A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture

The document discusses implementing a parallel region growing algorithm for image segmentation using MATLAB and GPU architecture. It compares the performance of executing the region growing algorithm serially versus in parallel on a GPU using CUDA. It aims to solve computational complexity and improve real-time performance of the region growing algorithm.

Uploaded by

Pournamy Rameez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views5 pages

A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture

The document discusses implementing a parallel region growing algorithm for image segmentation using MATLAB and GPU architecture. It compares the performance of executing the region growing algorithm serially versus in parallel on a GPU using CUDA. It aims to solve computational complexity and improve real-time performance of the region growing algorithm.

Uploaded by

Pournamy Rameez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

A New Approach for Parallel Region Growing


Algorithm in Image Segmentation using MATLAB
on GPU Architecture
Abhaya Kumar Sahoo1, Gaurav Kumar2, Ghungura Mishra3, Rachita Misra4
1,2,3,4Department of Information Technology, C.V.Raman College of Engineering, Bhubaneswar, India
1
[email protected],[email protected], [email protected],[email protected]

Abstract— Image segmentation is the process of dividing Parallel computing is a form of computation in which many
a digital image into multiple segments or clusters. The goal of calculations are carried out concurrently, on the principle that
segmentation is to simplify and/or change the representation of a large problem is broken into smaller ones, each part is
an image into something that is more meaningful for analysis further broken down to a set of instructions and instructions
.Image segmentation is typically used to separate the region of
from each part execute simultaneously on different processors.
interest from the input image. So we have different region of
interest for different applications. None of the algorithm will An overall coordination mechanism is employed. So we are
satisfy this global application to give better result from the trying to implement the given serial code of the region
computational point of view. So Segmentation is one of the growing algorithm using this parallel computing concept. As
challenging issues in Digital Image Processing. Many general- we use parallel language CUDA in MATLAB environment,
purpose algorithms have been developed for image segmentation first we try to integrate MATLAB with CUDA .Then we
in which Region Growing is one of them. This paper presents a calculate the execution time taken by the parallel code written
comparison between serial execution of the Region growing in CUDA language (provided by nvidia). The organization of
algorithm and Parallel execution of it on CUDA platform this paper is as follows. In Section 2, the GPU architecture is
(provided by nvidia) integration with MATLAB. Finally we are
briefly described. In Section 3, the sequential region growing
able to solve the computational complexity and improve the real-
time performance of Region Growing using GPU computing. is depicted. In Section 4, the parallel algorithm is exposed. In
MATLAB is used as a platform for serial execution and CUDA is Section 5, comparisons between the serial and parallel
used as a platform for parallel execution. algorithms are presented. In the last section, we conclude the
problem.
Keywords— Image Segmentation; GPU Computing; Parallel 2. NVIDIA’S CUDA ENABLED PARALLEL COMPUTING AND
Processing; Region Growing component MATLAB OVERVIEW

1. INTRODUCTION 2.1 GPU Architecture


Image is the two dimensional distributions of small image The term GPU was popularized by Nvidia in 1999, which
points called as pixel. It can be considered as a function of two marked the GetForce 256 as "the world's first 'GPU', or
real variables. Image Processing is the process of enhancing Graphics Processing Unit (GPU). Nowadays GPUs has
the image and extraction of meaningful information from an evolved into a highly parallel, multithreaded, many-core
image. Image Segmentation is one of the important steps in processor with high computational horsepower and very high
image processing and plays a key role in Image analysis memory bandwidth. In addition to rendering process, they are
process. The purpose of image segmentation is to partition an also suitable to general compute intensive, highly parallel
image into meaningful regions based on colour, texture, depth computation (GPGPU’s). NVIDIA’s GPU with the CUDA
or motion and greyscale.Image segmentation is typically used programming model provides an adequate API for non-
to locate objects and boundaries (lines, curves, etc.) in images graphics applications. CPU sees a CUDA device as a many-
[1]. core co-processor. At the hardware level, CUDA-enabled
GPU is a set of SIMD stream multiprocessors (SMs). Tesla
The various applications of Image segmentation are Content- C2070 has 14 no of SMs having 448 parallel cores. Each SM
based image retrieval, Medical imaging, and Object detection. contains a fast shared memory, which is shared by all of its
There are many algorithms used in Image segmentation in SPs as shown in Figure 1. It also has a read-only constant
which Region Growing Algorithm is one of the effective cache and texture cache which is shared by all the SPs on the
technique [6] and used widely .But there are some problems GPU. A set of local 32-bit registers is available for each SP.
having computationally expensive while execution of this The SMs communicate through the global/device memory.
algorithm in serial manner. The global memory can be read or written by the host, and is
persistent across kernel launched by the same application.
Shared memory is managed explicitly by the programmers.
Compared to the CPU, more transistors on the GPU are
978-1-4673-7437-8/15/$31.00 ©2015 IEEE
devoted to computing, so the peak floating-point capability of

279
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

the GPU is an order of magnitude higher than that of the CPU, interfacing with programs written in other languages like C,
as well as the memory bandwidth due to NVIDIA’s efforts on C++, Java, Fortran and Python [2].
optimization [1].
Although MATLAB is used for mathematical calculation but
beside this it has also many applications. It is also widely used
in image processing task. MATLAB also provide a way to
work with parallel programs by integrating it with some
programming model. Here we are using CUDA as a
programming model and trying to integrate the MATLAB
with CUDA environment [4]. We are using MATLAB 2013A
version for implementation of our parallel program.

Fig. 1. Working of GPU 3. RELATED WORK


The related work includes optimization of existing region
At first, data is copied to the memory of GPU if there is a part
growing algorithm. Previously all of the algorithms have been
of a program that is executed parallel. Then the CPU instructs
made serial. There have been proposed many algorithms
the processing to GPU that this is the parallel work. SM’s
implementation in which means intensity measure method is
(Streaming Multiprocessor) execute the parallel code in each
one of them. But this is a serial based and runs on very less
of its core using thread. Then after completion the result is
number of cores of general CPU’s.
again copied to the main memory of the CPU.
A. Understanding Image segmentation
2.2 CUDA Programming Model
Image segmentation is the process of partitioning a digital
At the software level, the CUDA model is a collection of image into multiple segments (sets of pixels, also known as
threads running in parallel. The unit of work issued by the host super pixels). The goal of segmentation is to simplify and/or
computer to the GPU is called a kernel. CUDA program is change the representation of an image into something that is
running in a thread-parallel fashion. Computation is organized more meaningful and easier to analyse. Image segmentation is
as a grid of thread blocks typically used to locate objects and boundaries (lines, curves,
which consists of a set of threads as shown in Fig. 2. At etc.) in images. More precisely, image segmentation is the
instruction level, 32 consecutive threads in a thread block process of assigning a label to every pixel in an image such
make up of a minimum unit of execution, which is called a that pixels with the same label share certain characteristics [3].
thread warp. Each SM executes one or more thread blocks
B. Region Growing Algorithm
concurrently. A block is a batch of SIMD-parallel threads that
runs on the same SM at a given moment. For a given thread,
Region growing algorithm is proposed originally by Baatz and
its index determines the portion of data to be processed.
Schape , as the basis of our parallel implementation.
Threads in a single block communicate through the shared
memory. CUDA consists of a set of C language extensions
This is an iterative method. The algorithm is given as:
and a runtime library that provides APIs to control the GPU.
Thus, CUDA programming model allows the programmers to
Step1: Take an image as input.
better exploit the parallel power of the GPU for general-
purpose computing [2,5]. Step2: Choose an initial seed point manually from the input
image.
Step3: Compare the initial seed point with the neighbouring
pixels in 4-neighbourhood connectivity or 8-neighboured
connectivity using mean intensity measure method.
Step4: Repeat step 2 and 3 until all the pixels are compared
and segmented region wise.

4. ALGORITHM IMPLEMENTATION
In this section we present the serial as well as parallel
implementation of the Region Growing Algorithm. As a
parallel platform CUDA gives better result. Our parallel
Fig. 2. GPU Architecture implementation is based on assigning the looping part in serial
code to the multiple numbers of threads. So that our task is
divided into different number of instruction that works
2.3. MATLAB Overview
independently on different number of threads as we see in the
MATLAB (matrix laboratory) is a numerical computing definition of parallel computation. By dividing the complexity
environment tool. This allows matrix manipulations, plotting of our task we will be able to get a better time complexity and
of functions and data, implementation of algorithms and better result than that of the serial implementation. We will

280
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

also see how to choose different neighbors in an image and 8. We will calculate the execution time taken by the code
how to use their intensity value to compare it with the other using the function ‘tic-tac’ given in the MATLAB.
pixel intensity value.
4.2. Proposed Parallel Algorithm
For parallel implementation we are trying to integrate
1. Collect your CUDA device properties by using the
MATLAB with CUDA. MATLAB gives us the ability to do
‘cudaDeviceProp’ structure, determine the available amount of
image processing task in a matrix format that will be helpful
for us in implementation with CUDA programming. shared-memory per block, DRAM, Constant cache memory
available with the device.
We are using MATLAB 2013a and also Visual C++ Express
2. We will convert our image matrix’s range between 0 to 1
Edition along with CUDA5.0 toolkit. For integration we are
rather than 0 to 255 by using a given function in MATLAB
using CUDA enabled MEX compiler called nvmex that is
i.e. ‘im2double’.Then we will provide our seed point
available in the legacy MATLAB plug-in on the nvidia site.
coordinate manually.

After integration at first we will try to implement given serial 3. Call the GPU kernel from CPU using <<< number of
code then after observing the time of execution of serial code blocks, number of threads, dynamic memory per block, stream
we will try to implement our parallel code and observe its time associated>>>(parameters) syntax. We will also pass the
of execution. Then we will compare both the results. necessary parameters along with a threshold value.

4.1 Proposed Serial Algorithm 4. We will calculate the region mean. Initially it is chosen as
the intensity of the given seed point. Then we will try to get
In the serial implementation part we are using MATLAB as a
the coordinates of the neighbors of the given seed point by
single platform for implementation. We will create two files in
using shared memory of individual block of SM.
the MATLAB one for function call (eg. code.m file) and other
is the main function in which we are writing our basic code of
5. Collect back the result from the GPU using the same
Region Growing Algorithm (eg. regiongrowing.m file).
cudaMemcpyAsynch() with associated stream[i].

1. In the first file we will convert our image matrix’s range 6. Inside GPU kernel function create two shared variable to
between 0 to 1 rather than 0 to 255 by using a given function store the region mean and intensity of seed point parameters
in MATLAB i.e. (from the global memory) using __shared__.
‘im2double’.Then we will provide our seed point coordinate
manually. 7. If the difference between the intensity value and the region
mean is less than the threshold we will chose that neighbors
2. We will call our main function i.e. regiongrowing from this for merging by writing result into shared memory. Finally we
code.m file. We will also pass the necessary parameters along get a list of different pixel that are the candidate for forming a
with a threshold value. separate region but we will chose minimum of that and do our
merging.
3. In the regiongrowing.m file we will calculate the region
mean. Initially it is chosen as the intensity of the given seed
8. This process stops until the difference between the pixel
point. Then we will try to get the coordinates of the neighbors
intensity and the region mean is less than that of the threshold
of the given seed point.
value. That result is written into global memory of GPU.
4. By getting the coordinates of the neighbors we will get their
9. Finally we will show our segmented image along with the
intensity value and then we will take the difference of this
original image using the given function in MATLAB
intensity value with the region mean.
‘imshow’.
5. If the difference between the intensity value and the region
10. We will calculate the execution time taken by GPU.
mean is less than the threshold we will chose that neighbors
for merging. Finally we get a list of different pixel that are the 5. EXPERIMENTAL ANALYSIS
candidate for forming a separate region but we will chose
minimum of that and do our merging. In experiment, we take MRI brain image for segmentation by
using both serial and parallel algorithm.
6. This process stops until the difference between the pixel
intensity and the region mean is less than that of the threshold
value.

7. Finally we will show our segmented image along with the


original image using the given function in MATLAB
‘imshow’.

281
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

Fig. 7. Output Image after applying Serial Algorithm


(Seed Point x=100, y=90)

Fig. 3. Original Image


Fig. 8. Output Image after applying Parallel Algorithm
(Seed Point x=10, y=10)

Fig. 9. Output Image after applying Parallel Algorithm


Fig. 4. Output Image after applying Serial Algorithm
(Seed Point x=90, y=60)
(Seed Point x=10, y=10)

Fig. 10. Output Image after applying Parallel Algorithm


(Seed Point x=100, y=100)
Fig. 5. Output Image after applying Serial Algorithm
(Seed Point x=90, y=60)

Fig. 11. Output Image after applying Parallel Algorithm


(Seed Point x=100, y=90)

After completing the environment setup that is required for the


running CUDA program in MATLAB, we will compile the
Fig. 6. Output Image after applying Serial Algorithm serial code in the MATLAB we got the segmented images. For
the different seed points, the original images are segmented in
(Seed Point x=100, y=100) different regions. The time taken to execute the serial code is
being monitored and is being compared with the time taken by
the parallel code. We have chosen a medical image for the
segmentation purpose and then we select the different seed
point coordinates which are shown in above.

282
2015 IEEE International Conference on Computer Graphics, Vision and Information Security (CGVIS)

6. PERFORMANCE EVALUATION
[5] Mark Harris NVIDIA Developer Technology,”Optimizing Parallel
Reduction in CUDA”.

[6] Markus Hofmann, Tobias Binna, Prof. Josef M. Joller, Prof. Peter
Sommerlad (Department of Computer Science, University of Applied Science
Rapperswil) ,“Massive Parallel Image Processing”, published on 2010.

The above graph shows performance of both CPU and GPU in


term of execution time by choosing different seed points for
image segmentation. Here GPU takes very less execution time
than CPU.

7. CONCLUSION
We have gone through the process of image
segmentation and the various algorithm of doing this.
All we need is high computational speed and high
performance at low cost. So, GPU based technology
will be the future in the area of image processing and
segmentation. The parallel algorithm essentially
assigns a particular thread to each image pixel so as
to exploit the GPU support of fine-grain threads and
the large number of processing elements available.
For a simple hardware (GeForce 9600 GT), the
parallel algorithm reached a maximum speed up of
4.97 and with a more powerful GPU (Tesla C1060)
an acceleration of 6.86 was achieved. It should also
be noted that these performance gains can be
obtained with low investment in hardware, as GPUs
with increasing processing power are currently
available on the market at declining prices. We can
optimize our result using new version of NVIDIA
graphics card by selecting automatic seed points
rather than manual.

8. REFERENCES
[1] P. N. Happ , R. Q. Feitosa (Pontifical Catholic University of Rio de
Janeiro) , C. Bentes( Rio de Janeiro State University) , and R. Farias(Federal
University of Rio de Janeiro),”A parallel image segmentation algorithm on
gpu” published in 2012.

[2] BrianDushaw( Applied Physics Laboratory, University of Washington)


“cuda and matlab” , published on febuary 12, 2010

[3] Aman Chandra Kaushik( Department of Bioinformatics, University


Institute of Engineering & Technology), and Vandana Sharma(
ChhatrapatiShahuJiMaharaj University, Kanpur-208024, Uttar Pradesh,
India),”Brain Tumor Segmentation from MRI images and volume calculation
of Tumor”, Volume 2 Issue 7 July2013 .

[4] Loren Dean Director of Engineering, MATLAB Products MathWorks ,


“GPU Computing with MATLAB”.

283

You might also like