Design_of_graphics_processing_unit_for_image_processing
Design_of_graphics_processing_unit_for_image_processing
Abstract— This work describes the designing of a Graphics using Verilog HDL that describes the flow of data between the
Processing unit that deals with image processing. Graphics registers. Synthesis is a process where the design is compiled
Processing Unit (GPU) is an important factor when it comes to and mapped into an implementation technology such as an
large computing. Images and videos that are having large data FPGA. Another Hardware Description Language (HDL) that
can be processed efficiently in GPU by exploiting its feature of
is used is Very High Speed Integrated Circuits HDL (VHDL)
parallel execution. Digital image processing implemented on
hardware provides higher processing speed and performance. [1].
The use of Verilog HDL for the design of GPU provides an
II. GPU DESIGN
immediate implementation possibility. The paper focuses on
image processing operations like Brightness manipulation, The GPU designed for image processing supports 4-stage
Contrast manipulation, image cropping, image zooming, image pipelining. The 4-stages of pipelining include
rotation and morphological operators such as Dilation and
Erosion. Fetch: The instructions are fetched from instruction
memory. It is then placed in the instruction Register.
Keywords—GPU; FPGA; Processing element; Verilog HDL
Decode: The instructions are decoded. The opcode of any
I. INTRODUCTION instruction indicates the operation to be performed
Execute: Here the operations are performed on the data
The GPU has become an integral part of a computing
system as the demand for graphics applications increased. pixels. Parallel Processing elements are used to perform
Images and videos contain a large number of data pixels and operations
therefore requires large amount of computation for processing. Store: After the processing of data pixels the data are stored
A dedicated processor for operations dealing with video and back to memory and Local Data Share.
image processing decreases the burden of the Central
Processing Unit (CPU) which can do different other tasks that The processor can perform basic operations such as
has less computation. The GPU is used as a dedicated addition, subtraction, comparisons and multiplication. These
processor that processes similar and repeated tasks. Having operations are used for image processing operations such as
both CPU and GPU in one machine provides an environment brightness manipulation, image cropping, and image zooming
for running mixed works and media centric applications. This and image rotation and for morphological operations.
arrangement has now become a standard in PCs, notebooks and Hardware Organization of the GPU includes Memory, the
mobile phones [1]. Program Counter (PC), Processor, Processing Elements (P.E)
The GPU has become a standard when it comes to large data and Local Data Share
computation. The main feature of the GPU is its ability to
perform parallel execution. Single Instruction Multiple Data
(SIMD) architecture used by GPU allows it to process on
multiple data with the same instruction. Processing elements
in the GPU are used to process multiple data pixels [2]. Here
image pixels are separated into different groups using
windows that pass over the image pixels. Each group of data
pixels is processed in the processing elements which contains
ALU and a set of registers. These processing elements are
executed in a parallel fashion, increasing the processing speed.
Verilog Hardware Description Language (HDL) is a general-
purpose HDL. It is used to describe the design of the system
and verify the system.
The syntax of Verilog HDL is similar to the C programming
language. Design is done at Register Transfer Level (RTL) Fig.1. Hardware Organization
The Program Counter points the current instruction to be order to reduce the brightness of the image, we subtract
executed. Instruction contains opcode and operand. The constant value from each pixel data. This operation is an
instruction to be executed is fetched in fetch state and decoded image enhancement operation. The matrix model of
in decode state of the processor. Depending upon the opcode manipulation of brightness of the image is as shown below [4].
of the instruction the Processing elements process the data in
executing state. Each processing element consists of a set of
registers and ALU. The data processed is stored in the data
memory in store state. Figure 1 shows the hardware
organization of the GPU. Figure 2 shows the GPU design
where the dotted part indicates a processing element. There
will be similar parallel processing elements. Each processing
element contains a set of registers, accumulator and an ALU.
B. Contrast Manipulation
Contrast manipulation is an image enhancement operation.
To increase the contrast of the image we increase the
separation between the dark and bright values and also
interpolate the values between them. If the pixel value that is
processed is greater than 255 we set the value as 255. And if it
is less than 0 then the value is set as 0. In order to reduce the
contrast of the image we need to decrease the difference
between brighter values and darker values. The matrix model
of increasing the contrast of the image is as shown below.
300
Authorized licensed use limited to: Penn State University. Downloaded on January 28,2024 at 22:59:33 UTC from IEEE Xplore. Restrictions apply.
2014 First International Conference on Computational Systems and Communications (ICCSC) | 17-18 December 2014 | Trivandrum
E. Image Rotation
Image Rotation is another feature that is used in media
applications. In image rotation, the image can be rotated
clockwise or anticlockwise. Image rotation uses a special
algorithm that can be used to rotate the image.
V. RESULTS
The processing operations were performed on an image
F. Dilation using Verilog HDL. The operations include brightness
Dilation is morphological operation. In dilation, each manipulation, contrast manipulation, image cropping, image
output pixel depends upon the neighboring input pixels. The rotation, image zooming and morphological operations such as
input image data is scanned with the structuring element to dilation and erosion. The processor, designed for image
obtain the result. Each output pixel depends upon the input processing was implemented on Spartan 3 FPGA (XC3S500E-
neighboring pixels. The largest neighboring pixel will be 4FG320C). Table I indicates the comparison of processing
placed in the output. The processing will increase the area of time required for GPU implemented in FPGA and Matlab. The
white pixels, i.e. white pixels will dilate. Dilation removes output obtained from Matlab after processing using Verilog
unwanted black pixel noises from white areas. The matrix HDL are as shown in the figures below.
model for dilation operation is as shown below. The
TABLE I.
structuring element moves over the image and the input pixel
where the origin of the structuring element coincides, is Stage Processing Time
modified to obtain the output pixel.
Matlab 20ms
GPU 5.313ns
301
Authorized licensed use limited to: Penn State University. Downloaded on January 28,2024 at 22:59:33 UTC from IEEE Xplore. Restrictions apply.
2014 First International Conference on Computational Systems and Communications (ICCSC) | 17-18 December 2014 | Trivandrum
VI. CONCLUSION
In this work, a Graphics Processing Unit (GPU) that deals
with image processing has been designed. Image processing
operations done, here are now commonly used in computers
and mobiles. Using Verilog HDL for design helps in immediate
implementation on the FPGA board. It has been observed from
the results that the processing speed for GPU implemented on
Spartan FPGA is very much higher compared to Matlab. The
parallelism feature in GPU and hardware implementation
provide higher processing speed. Here, basic algorithms were
used for image processing. Complex algorithms have to be
Fig.5. Results for Image Cropping used for future works to improve the quality of images and
more image processing operations are needed to be included.
REFERENCES.
302
Authorized licensed use limited to: Penn State University. Downloaded on January 28,2024 at 22:59:33 UTC from IEEE Xplore. Restrictions apply.