0% found this document useful (0 votes)

12 views15 pages

Xapp 1167

This application note outlines the process of using the OpenCV library to develop computer vision applications on Zynq-7000 All Programmable SoCs, leveraging Vivado HLS video libraries for high-performance implementations. It details a design flow for migrating OpenCV applications to Zynq devices, allowing for efficient pixel processing and integration into programmable logic. The note also discusses two architectures for video processing—direct streaming and frame-buffer streaming—highlighting the advantages of the latter for flexibility and ease of understanding.

Uploaded by

dung123qwe456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Xapp 1167

Uploaded by

dung123qwe456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/287646405

Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using

Vivado HLS Video Libraries

Article · January 2013

CITATIONS READS

27 5,075

3 authors, including:

Stephen Neuendorffer

54 PUBLICATIONS 3,846 CITATIONS

SEE PROFILE

All content following this page was uploaded by Stephen Neuendorffer on 15 March 2019.

The user has requested enhancement of the downloaded file.

Application Note: Vivado HLS

Accelerating OpenCV Applications with

Zynq-7000 All Programmable SoC using
XAPP1167 (v3.0) April 30, 2015 Vivado HLS Video Libraries
Author: Stephen Neuendorffer, Thomas Li, and Devin Wang

Summary This application note describes how the OpenCV library can be used to develop computer
vision applications on Zynq®-7000 All Programmable SoCs. OpenCV can be used at many
different points in the design process, from algorithm prototyping to in-system execution.
OpenCV code can also migrate to synthesizable C++ code using video libraries that are
delivered with Vivado® High-Level Synthesis (HLS). When integrated into a Zynq SoC design,
the synthesized blocks enable high resolution and frame rate computer vision algorithms to be
implemented.

Introduction Computer vision is a field that broadly includes many interesting applications, from industrial
monitoring systems that detect improperly manufactured items to automotive systems that can
drive cars. Many of these computer vision systems are implemented or prototyped using
OpenCV, a library which contains optimized implementations of many common computer vision
functions targeting desktop processors and GPUs. Although many functions in the OpenCV
library have been heavily optimized to enable many computer vision applications to run close to
real-time, an optimized embedded implementation is often preferable.
This application note presents a design flow enabling OpenCV programs to be retargeted to
Zynq devices. The design flow leverages HLS technology in the Vivado Design Suite, along
with optimized synthesizable video libraries. The libraries can be used directly, or combined
with application-specific code to build a customized accelerator for a particular application. This
flow can enable many computer vision algorithms to be quickly implemented with both high
performance and low power. The flow also enables a designer to target high data rate pixel
processing tasks to the programmable logic, while lower data rate frame-based processing
tasks remain on the ARM® cores.
As shown in the Figure below, OpenCV can be used at multiple points during the design of a
video processing system. On the left, an algorithm may be designed and implemented
completely using OpenCV function calls, both to input and output images using file access
functions and to process the images. Next, the algorithm may be implemented in an embedded
system (such as the Zynq Base TRD), accessing input and output images using
platform-specific function calls. In this case, the video processing is still implemented using
OpenCV functions calls executing on a processor (such as the Cortex™-A9 processor cores in
Zynq Processor System). Alternatively, the OpenCV function calls can be replaced by
corresponding synthesizable functions from the Xilinx Vivado HLS video library. OpenCV
function calls can then be used to access input and output images and to provide a golden
reference implementation of a video processing algorithm. After synthesis, the processing
block can be integrated into the Zynq Programmable Logic. Depending on the design
implemented in the Programmable Logic, an integrated block may be able to process a video
stream created by a processor, such as data read from a file, or a live real-time video stream
from an external input.

© Copyright 2013—2015 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and other designated brands included herein are trademarks of Xilinx in the
United States and other countries. All other trademarks are the property of their respective owners.

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 1

Reference Design

X-Ref Target - Figure 1

,PDJH5HDG 2SHQ&9

,PDJH5HDG
9LGHR)UDPH5HDG 9LGHR)UDPH5HDG
2SHQ&9 2SHQ&9$;,YLGHR

$;,YLGHR0DW

2SHQ&9IXQFWLRQ 2SHQ&9IXQFWLRQ +/6YLGHROLEUDU\ 6\QWKHVL]HG)3*$

6\QWKHVL]DEOH
FKDLQ FKDLQ IXQFWLRQFKDLQ 3URFHVVLQJ%ORFN

%ORFN
0DW$;,YLGHR

$;,YLGHR2SHQ&9
,PDJH:ULWH
9LGHR)UDPH:ULWH 9LGHR)UDPH:ULWH
2SHQ&9
,PDJH:ULWH 2SHQ&9

Figure 1: Design Flow

The design flow for this application note generally follows the steps below:
1. Develop and execute an OpenCV application on Desktop.
2. Recompile and execute the OpenCV application in the Zynq SoC without modification.
3. Refactor OpenCV application using I/O functions to encapsulate an accelerator function.
4. Replace OpenCV function calls with synthesizable video library function calls in accelerator
function.
5. Generate an accelerator and corresponding API from the accelerator function using Vivado
HLS.
6. Replace calls to the accelerator function with calls to the accelerator API
7. Recompile and execute the accelerated application

Reference The reference design files can be downloaded from:

Design https://secure.xilinx.com/webreg/clickthrough.do?cid=323570
The reference design matrix is shown in Table 1.
Table 1: Reference Design Matrix
Parameter Description
General
Developer name Thomas Li
Target devices (stepping level, ES, production, XC7020-1
speed grades)
Source code provided Yes
Source code format C
Design uses code/IP from existing Xilinx Yes, based off the Zynq Base TRD application
application note/reference designs, CORE note/reference designs, CORE Generator
Generator software, or third party software, or third party Simulation

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 2

Video Processing Libraries in Vivado HLS

Table 1: Reference Design Matrix

Parameter Description
Simulation
Functional simulation performed Yes, in C
Timing simulation performed No
Test bench used for functional and timing Provided
simulation
Test bench format C
Simulation software/version used g++
SPICE/IBIS simulations No implementation
Implementation
Synthesis software tools/version used Vivado 2014.4
Implementation software tools/versions used Vivado 2014.4
Static timing analysis performed Yes
Hardware Verification
Hardware verified Yes
Hardware platform used for verification ZC702

Video Vivado HLS contains a number of video libraries, intended to make it easier for you to build a
Processing variety of video processing. These libraries are implemented as synthesizable C++ code and
roughly correspond to video processing functions and data structures implemented in OpenCV.
Libraries in Many of the video concepts and abstractions are very similar to concepts and abstractions in
Vivado HLS OpenCV. In particular, many of the functions in the OpenCV imgproc module have
corresponding Vivado HLS library functions.
For instance, one of the most central elements in OpenCV is the cv::Mat class, which is usually
used to represent images in a video processing system. A cv::Mat object is usually declared as
shown in the following example:
cv::Mat image(1080, 1920, CV_8UC3);
This declares a variable image and initializes it to represent an image with 1080 rows and 1920
columns, where each pixel is represented by 3 eight bit unsigned numbers. The synthesizable
library contains a corresponding hls::Mat<> template class that represents a similar concept in
a synthesizable way:
hls::Mat<2047, 2047, HLS_8UC3> image(1080, 1920);
The resulting object is similar, except the maximum size and format of the image are described
using template parameters in addition to constructor arguments. This ensures that Vivado HLS
can determine the size of memory to be used when processing this image and optimize the
resulting circuit for a particular pixel representation. The hls::Mat<> template class also
supports dropping the constructor arguments entirely when the actual size of the images being
processed is the same as the maximum size:
hls::Mat<1080, 1920, HLS_8UC3> image();
Similarly, the OpenCV library provides a mechanism to apply a linear scaling to the value of
each pixel in an image in the cvScale function. This function might be invoked as shown
below:
cv::Mat src(1080, 1920, CV_8UC3);
cv::Mat dst(1080, 1920, CV_8UC3);

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 3

Architectures for Video Processing

cvScale(src, dst, 2.0, 0.0);

This function call scales pixels in the input image src by a factor of 2 with no offset and
generates an output image dst. The corresponding behavior in the synthesizable library is
implemented by calling the hls::Scale template function:
hls::Mat<1080, 1920, HLS_8UC3> src;
hls::Mat<1080, 1920, HLS_8UC3> dst;
hls::Scale(src, dst, 2.0, 0.0);
Note: Notice that although hls::Scale is a template function, the template arguments need not be
specified, since they are inferred from the template arguments in the declaration of src and dst. The
hls::Scale template function does require, however, that the input and output images have the same
template arguments. A complete list of the supported functions can be found in the Vivado Design Suite
User Guide: High-Level Synthesis (UG902) [Ref 4].

Architectures Video Processing Designs in Zynq SoCs commonly follow one of the following two generic
for Video architectures. In the first architecture, referred to as “direct streaming”, pixel data arrives at the
input pins of the Programmable Logic and is transmitted directly to a video processing
Processing component and then directly to a video output. A direct streaming architecture is typically the
simplest and most efficient way to process video, but it requires that the video processing
component be able to process frame strictly in real time.
X-Ref Target - Figure 2

''5
3URFHVVLQJ
6\VWHP
''50HPRU\&RQWUROOHU

$0%$6ZLWFKHV
6B$;,B+3ELW
6B$;,B*3EELW
$0%$6ZLWFKHV

$;,6WUHDP
86%&DPHUD +DUGHQHG $38 ,3&RUH
3HULSKHUDOV 86% 'XDO&RUH
*LJ(&$163, &RUWH[$2&0
8$57&*3,2

$;,,QWHUFRQQHFW

,D/ 9LGHR 9LGHR ,D/

9LGHR
3URFHVVLQJ 'LVSOD\
,QSXW
&RPSRQHQW &RQWUROOHU

;

Figure 2: Direct Streaming Architecture for Video Processing

In the second architecture, refer to as “frame-buffer streaming”, pixel data is first stored in
external memory before being processed and stored again in external memory. A video display
controller is then required to output the processed video. A frame-buffer streaming architecture
allows more decoupling between the video rate and the processing speed of the video
component, but it requires enough memory bandwidth to read and write the video frames into
external memory.

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 4

AXI 4 Streaming Video

X-Ref Target - Figure 3

''5
3URFHVVLQJ
6\VWHP
''50HPRU\&RQWUROOHU

$0%$6ZLWFKHV
6B$;,B+3ELW
6B$;,B*3EELW

$0%$6ZLWFKHV
$;,6WUHDP
86%&DPHUD +DUGHQHG $38 ,3&RUH
3HULSKHUDOV 86% 'XDO&RUH
*LJ(&$163, &RUWH[$2&0
8$57&*3,2

$;,,QWHUFRQQHFW

9LGHR ,D/
$;, $;, $;,
'LVSOD\
9'0$ 9'0$ 9'0$
&RQWUROOHU

,D/ 9LGHR
9LGHR
3URFHVVLQJ
,QSXW
&RPSRQHQW

;

Figure 3: Frame-buffer Architecture for Video Processing

This application note focuses on the frame-buffer streaming architecture, since it provides more
flexibility and is easier to understand how video processing on the processor cores can be
accelerated. For highly optimized systems, it is relatively straightforward to construct a direct
streaming architecture from a frame-buffer streaming architecture.

AXI 4 Streaming Video Processing Components from Xilinx generally use a common AXI4 Streaming protocol to
Video communicate pixel data. This protocol describes each line of video pixels as an AXI4 packet,
with the last pixel of each line marked with the TLAST signal asserted. In addition, the start of
the video frame is indicated by marking the first pixel of the first line with the USER[0] bit
asserted. For more information about this protocol see [Ref 2].
Although the underlying AXI4 Streaming Video protocol does not require constraints on the
size of lines in an image, most complex video processing computations are greatly simplified
when all of the video lines are the same length. This restriction is almost always satisfied by any
digital video format, except perhaps in a transient which occurs at the beginning of a sequence
of video frames. Dealing with such transients is usually only a problem on the input interface of
a processing block, which needs to correctly handle this transient before transitioning to
processing continuous rectangular frames. The input interface that receives an AXI4 Streaming
Video protocol can ensure that each video frame consists of exactly ROWS * COLS pixels.
Then later blocks in the pipeline can assume that video frames are complete and rectangular.

Video Interface To abstract a programmer from these interfacing issues, Vivado HLS includes a set of
Libraries in synthesizable video interface library functions. These functions are shown in the table below.

Vivado HLS

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 5

Limitations

Table 2: Vivado HLS Synthesizable Video Functions

Video Library Function Description
hls::AXIvideo2Mat Converts data from an AXI4 video
stream representation to hls::Mat format.
hls::Mat2AXIvideo Converts data stored as hls::Mat format
to an AXI4 video stream.

In particular, the AXIvideo2Mat function receives a sequence of images using the AXI4
Streaming Video and produces an hls::Mat representation. Similarly, the Mat2AXIvideo
function receives an hls::Mat representation of a sequence of images and encodes it correctly
using the AXI4 Streaming video protocol.
These functions don’t determine the image size based on AXI4 video stream, but use the
image size specified in the hls::Mat constructor arguments. In systems designed to process an
arbitrary size input image with AXI4 Streaming interfaces, the image size must be determined
externally to the video library block. In the Zynq Video TRD, the image size processed by the
accelerator is exposed as AXI4-Lite control registers, however the software and the rest of the
system is only designed to process 1920x1080 resolutions. In more complicated systems, the
Xilinx Video Timing Controller core [Ref 3] could be used to detect the size of a received video
signal.
The video libraries also contain the following non-synthesizable video interface library
functions:

Table 3: Vivado HLS Non-Synthesizable Video Functions

Video Library Functions
hls::cvMat2AXIvideo hls::AXIvideo2cvMat
hls::IplImage2AXIvideo hls::AXIvideo2IplImage
hls::CvMat2AXIvideo hls::AXIvideo2CvMat

These functions are commonly used in conjunction with the synthesizable functions to
implement OpenCV-based testbenches.

Limitations There are several limitations to the current synthesizable library, which may not be otherwise
obvious. The basic limitation is that OpenCV functions cannot be synthesized directly, and must
be replaced by functions from the synthesizable library. This limitation is primarily because
OpenCV functions typically include dynamic memory allocation, such as during the constructor
of an arbitrarily sized cv::Mat object, which is not synthesizable.
A second limitation is that the hls::Mat<> datatype used to model images is internally
defined as a stream of pixels, using the hls::stream<> datatype, rather than as an array of
pixels in external memory. As a result, random access is not supported on images, and the
cv::Mat<>.at() method and cvGet2D() function have no correspondence. Streaming
access also implies that if an image is processed by more than one function, then it must first
be duplicated into two streams, such as by using the hls::Duplicate<> function. Streaming
access also implies that an area of an image cannot be modified without processing the
unmodified pixels around the image.
Another limitation relates to datatypes. OpenCV functions typically support either integer or
floating-point datatypes. However, floating point is often more expensive and avoided when
targeting programmable logic. In many cases, Vivado HLS supports replacing float and
double types with the Vivado HLS fixed point template classes ap_fixed<> and
ap_ufixed<>. Currently this is not uniformly supported in the synthesizable libraries, since
certain functions (such as hls::Filter2D and hls::ConvertScale) require floating-point
arguments. Additionally, since OpenCV performs floating-point operations, it should not be

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 6

Reference Designs

expected that the results of the synthesizable library are generally bit-accurate to the
corresponding OpenCV functions, although the intention is generally for the behavior to be
functionally equivalent. In some cases the synthesizable libraries do perform internal
fixed-point optimizations to reduce the use of floating-point operations.
A final limitation is that interface functions are only provided for AXI4 Streaming Video. These
interfaces can be integrated at the system level with the Xilinx Video DMA (VDMA) core and
other video processing IP cores, but cannot be directly connected to AXI4 Slave ports or AXI4
Master ports. One implication of this is that in designs that require external memory frame
buffers (for instance, in order to process several consecutive frames together) the frame buffer
must be implemented externally with several VDMA cores managed by the processor.

Reference This app note contains several HLS designs. These designs modify the behavior of the Zynq
Designs Base Targeted Reference Design, replacing the image processing filter in the programmable
logic with a filter generated using Vivado HLS and the Vivado HLS synthesizable libraries. The
golden model of the synthesizable filter is implemented using OpenCV libraries, enabling the
behavior of the synthesized code to be verified. The designs also modify the Linux application,
enabling either an OpenCV implementation of the filter or the synthesizable implementation of
the filter to be executed on Cortex-A9 cores.
The reference designs includes:
• demo: a chain of pixel processing functions;
• fast-corners: fast algorithm for corner detection;
• pass-through: does nothing but pass-through
• simple-median: simple median filter
• simple-posterize: simple posterization
• sobel: Sobel filter with thresholds, similar to the one in Zynq Base TRD.
In this section, three typical reference designs will be introduced in a progressive order to give
an general idea of accelerating an OpenCV application.
The first reference design is "pass-through", it is a transparent image filter, only passing the
input frame to output without any modification. It can be used as a filter template in user's
application. The OpenCV code is simple:
49 void opencv_image_filter(IplImage *src, IplImage *dst) {
50 cvCopy(src, dst);
51 }
Compared to OpenCV code, the synthesizable code contains a number of #pragma directives
that enable ease of interfacing. These directives expose the input and output streams as AXI4
Streaming interfaces (lines 46-47) and expose the other inputs in the bundled control bus as an
AXI4 Lite Slave interface (lines 49-51). Notice that the offset of rows and cols are specified to
align with drivers that TRD used. In addition, the rows and cols inputs are specified as being
stable inputs, since the block is expected to process the same size images repeatedly (lines
53-54). This specification enables additional optimization on these variables, since they need
not be pushed through each level of the pipeline. Lastly dataflow mode is selected (line 57),
enabling concurrent execution of the various processing functions, with pixels being streamed
from one block to another.
44 void image_filter(AXI_STREAM& video_in, AXI_STREAM& video_out, int
rows, int cols) {
45 //Create AXI streaming interfaces for the core
46 #pragma HLS INTERFACE axis port=video_in bundle=INPUT_STREAM
47 #pragma HLS INTERFACE axis port=video_out bundle=OUTPUT_STREAM
48
49 #pragma HLS INTERFACE s_axilite port=rows bundle=CONTROL_BUS
offset=0x14

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 7

Reference Designs

50 #pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS

offset=0x1C
51 #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
52
53 #pragma HLS INTERFACE ap_stable port=rows
54 #pragma HLS INTERFACE ap_stable port=cols
55
56 YUV_IMAGE img_0(rows, cols);
57 #pragma HLS dataflow
58 hls::AXIvideo2Mat(video_in, img_0);
59 hls::Mat2AXIvideo(img_0, video_out);
60 }
The second reference design, named demo, contains a simple pipeline of functions shown in
Figure 4.
X-Ref Target - Figure 4

6XEWUDFW
6FDOH9DOXH (URGH 'LODWH
7KUHVKKROG
;

Figure 4: Demo Design Block Diagram

In OpenCV, this application can be implemented using a sequence of library calls, as shown in
the following code example, excerpted from apps/demo/opencv_top.cpp.
49 void opencv_image_filter(IplImage *src, IplImage *dst) {
50 IplImage* tmp = cvCreateImage(cvGetSize(src), src->depth,
src->nChannels);
51 cvCopy(src, tmp);
52 cvSubS(tmp, cvScalar(50, 50), dst);
53 cvScale(dst, tmp, 2, 0);
54 cvErode(tmp, dst);
55 cvDilate(dst, tmp);
56 cvCopy(tmp, dst);
57 cvReleaseImage(&tmp);
58 }
The sequence of OpenCV function calls has been replaced by HLS Video Library calls, which
has the similar interface and equivalent behavior. Based on the template of pass-through
design, the synthesizable code is straightforward, the #pragma directives are the same, the
differences are mainly adding several hls::Mat declaration (IMAGE_C2 is a type of two
channels YUV image defined in top.h) and connecting the processing pipeline with HLS library
functions.
44 void image_filter(AXI_STREAM& video_in, AXI_STREAM& video_out, int
rows, int cols) {
45 //Create AXI streaming interfaces for the core
46 #pragma HLS INTERFACE axis port=video_in bundle=INPUT_STREAM
47 #pragma HLS INTERFACE axis port=video_out bundle=OUTPUT_STREAM
48
49 #pragma HLS INTERFACE s_axilite port=rows bundle=CONTROL_BUS
offset=0x14
50 #pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS
offset=0x1C
51 #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
52
53 #pragma HLS INTERFACE ap_stable port=rows
54 #pragma HLS INTERFACE ap_stable port=cols
55
56 IMAGE_C2 img_0(rows, cols);
57 IMAGE_C2 img_1(rows, cols);

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 8

Reference Designs

58 IMAGE_C2 img_2(rows, cols);

59 IMAGE_C2 img_3(rows, cols);
60 IMAGE_C2 img_4(rows, cols);
61 PIXEL_C2 pix(50, 50);
62 #pragma HLS dataflow
63 hls::AXIvideo2Mat(video_in, img_0);
64 hls::SubS(img_1, pix, img_1);
65 hls::Scale(img_1, img_2, 2, 0);
66 hls::Erode(img_2, img_3);
67 hls::Dilate(img_3, img_4);
68 hls::Mat2AXIvideo(img_4, video_out);
69 }
The third reference design, named “fast-corners”, contains a more complex pipeline, shown in
Figure 5.

)DVW
7R
&RUQHUV
*UD\VFDOH 'UDZ
'HWHFWLRQ
&RUQHUV

;
X-Ref Target - Figure 5

Figure 5: Fast-Corners Application

In OpenCV, this can be implemented using the following code example, excerpted from
apps/fast-corners/opencv_top.cpp.
49 void opencv_image_filter(IplImage *_src, IplImage *_dst) {
50 Mat src(_src);
51 Mat dst(_dst);
52 cvCopy(_src, _dst);
53 std::vector<Mat> layers;
54 std::vector<KeyPoint> keypoints;
55 split(src, layers);
56 FAST(layers[0], keypoints, 20, true);
57 for (int i = 0; i < keypoints.size(); i++) {
58 rectangle(dst,
59 Point(keypoints[i].pt.x-1, keypoints[i].pt.y-1),
60 Point(keypoints[i].pt.x+1, keypoints[i].pt.y+1),
61 Scalar(255,0),
62 CV_FILLED);
63 }
64 }
Although it looks simple, this application contains some functions that do not have
corresponding functions in the synthesizable library.The std::vector class is not
synthesizable (line 54). The cv::rectangle function is not currently implemented in the
synthesizable library due to its in-place update (line 58-62).
One possibility for implementing this application is to partition it, running the green channel
extraction and FAST corner detection in programmable logic, while marking the corners with
rectangles with code running on the processing system. Instead we choose to simplify the code
slightly, while staying within the pixel processing paradigm that can be implemented in the
FPGA. The simplified version of the code is shown in the following code example
69 void opencv_image_filter(IplImage *_src, IplImage *_dst) {
70 Mat src(_src);
71 Mat dst(_dst);
72 cvCopy(_src, _dst);
73 Mat mask(src.rows, src.cols, CV_8UC1);
74 Mat dmask(src.rows, src.cols, CV_8UC1);

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 9

Reference Designs

75 std::vector<Mat> layers;
76 std::vector<KeyPoint> keypoints;
77 split(src, layers);
78 FAST(layers[0], keypoints, 20, true);
79 GenMask(mask, keypoints);
80 dilate(mask, dmask, getStructuringElement(MORPH_RECT, Size(3,3),
Point(1,1)));
81 PaintMask(dst, dmask, Scalar(255,0));
82 }
This code, although written in OpenCV is structured in a way that is amenable to transformation
into synthesizable code. In particular, this combination generates the keypoints as an image
mask, rather than as a dynamically allocated structure. The PaintMask function takes such as
mask and draws on top of the image. The synthesizable version of the fast corners application
is shown in the following code example, excerpted from apps/fast-corners/top.cpp.
44 void image_filter(AXI_STREAM& video_in, AXI_STREAM& video_out, int
rows, int cols) {
45 //Create AXI streaming interfaces for the core
46 #pragma HLS INTERFACE axis port=video_in bundle=INPUT_STREAM
47 #pragma HLS INTERFACE axis port=video_out bundle=OUTPUT_STREAM
48
49 #pragma HLS INTERFACE s_axilite port=rows bundle=CONTROL_BUS
offset=0x14
50 #pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS
offset=0x1C
51 #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
52
53 #pragma HLS INTERFACE ap_stable port=rows
54 #pragma HLS INTERFACE ap_stable port=cols
55
56 IMAGE_C2 img_0(rows, cols);
57 IMAGE_C2 img_1(rows, cols);
58 IMAGE_C2 img_1_(rows, cols);
59 IMAGE_C1 img_1_Y(rows, cols);
60 IMAGE_C1 img_1_UV(rows, cols);
61 IMAGE_C2 img_2(rows, cols);
62 IMAGE_C1 mask(rows, cols);
63 IMAGE_C1 dmask(rows, cols);
64 PIXEL_C2 color(255,0);
65 #pragma HLS dataflow
66 #pragma HLS stream depth=20000 variable=img_1_.data_stream
67 hls::AXIvideo2Mat(video_in, img_0);
68 hls::Duplicate(img_0, img_1, img_1_);
69 hls::Split(img_1, img_1_Y, img_1_UV);
70 hls::Consume(img_1_UV);
71 hls::FASTX(img_1_Y, mask, 20, true);
72 hls::Dilate(mask, dmask);
73 hls::PaintMask(img_1_, dmask, img_2, color);
74 hls::Mat2AXIvideo(img_2, video_out);
75 }
One important thing to observe about this code is the use of the directive in line 66 to set the
depth of one of the streams. In this case, there is a latency of several video lines on the path
from Duplicate > FASTX > Dilate > PaintMask because of the presence of line
buffers. In this case, FASTX incurs up to 7 lines of latency, Dilate incurs up to 3 lines of
latency, which implies a need of at least: 1920 * (7+3) < 20000 pixels. In addition, the
hls::FASTX call in line 71 corresponds to the combination of the cv::FAST and GenMask
functions in previous modified OpenCV code.

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 10

Source Files and Project Directories

Source Files The overall structure of the files in the application note mirrors the structure of the Zynq Base
and Project TRD. A prebuilt image for an SD card is provided in ready_to_test directory, making it easy to
try the design out. This image is slightly modified from the Zynq Base TRD to include a
Directories precompiled version of the OpenCV libraries for ARM in the ramdisk image that is loaded at
boot time. The device tree has also been modified to include the correct configuration of the
generic-uio Linux kernel driver for accessing control registers in the image processing filter.
The apps directory contains the HLS reference designs. To enable easily rebuilding the
designs, a Makefile is also provided with the following targets:
• make all -- build sw and hw, generate sd_image
• make csim -- run C simulation
• make cosim -- run C/RTL co-simulation
• make core -- run high level synthesis and export IP
• make bitstream -- generate bitstream
• make boot -- generate boot image
• make elf -- build software application
• make help -- print help
In each design directory (apps/<design name>), running ‘make all’ from the Linux command
line or the Vivado HLS command prompt under Windows rebuilds the entire design and
generates an SD card image that can be run. New applications can be made by copying one of
the initial apps/ subdirectories, modifying the source code, and running the appropriate
makefile rules. The overall structure of the FPGA design is not modified based on the C code,
so care must be taken to limit modifications to those that do not modify the interface of the
generated RTL. Alternatively, the interface can be modified, as long the corresponding changes
are made in the FPGA design.

Steps to In this section we will demonstrate a walkthrough of how to accelerate an OpenCV application
Accelerate an and test it on board. As an example, the OpenCV application is a simple 2D filter of image
processing: erode. By following this walkthrough, one can easily write their OpenCV application
OpenCV by replacing the image processing algorithm, and the two reference designs included in the
Application package are also applicable to follow this instruction.
To achieve this goal, several prerequisites are needed:
• Xilinx Zynq-7000 SoC ZC702 Evaluation Kit or Xilinx Zynq-7000 SoC Video and Imaging
Kit
• Monitor with HDMI™ port or DVI port (HDMI/DVI cable needed), supports 1920x1080
resolution, 60 frame rate display
• Linux/Windows host
• XAPP 1167 package (shipped with this app note)
• Vivado Design Suite 2014.4, System Edition
• (Optional) FMC-IMAGEON board to enable HDMI live video input
Most of the following steps will be demonstrated in Linux command line. For Windows, a batch
file that starts the Vivado HLS command prompt with correctly set paths is provided that works
with the same commands. This batch file may need to be modified to reflect the installation path
of the Vivado tools if they are not installed in the default locations.
First, extract the content of the package, use it as home directory:
$ export VIDEO_HOME=/path/to/the/extracted/package/root
$ cd ${VIDEO_HOME}
$ ls
apps doc hardware ready_to_test software xapp1167_windows.bat

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 11

Steps to Accelerate an OpenCV Application

Note: If FMC-IMAGEON board is used, please modify line 95 apps/common/configure.mk to

WITH_FMC := y to select the correct device tree; The default is WITH_FMC := n.

Step 1. Create new design

The pass-through design can be used as a template for image processing design. To create the
new erode design, copy the pass-through design directory in apps directory:
$ cd ${VIDEO_HOME}/apps
$ cp –r pass-through erode
$ cd erode
Edit the source file opencv_top.cpp, replace the cvCopy function with cvErode function:
void opencv_image_filter(IplImage *src, IplImage *dst) {
cvErode(src, dst);
}
Edit the source file top.cpp, add the erode function in HLS Video Library - hls::Erode between
interface functions:
44 void image_filter(AXI_STREAM& video_in, AXI_STREAM& video_out, int
rows, int cols) {
45 //Create AXI streaming interfaces for the core
46 #pragma HLS INTERFACE axis port=video_in bundle=INPUT_STREAM
47 #pragma HLS INTERFACE axis port=video_out bundle=OUTPUT_STREAM
48
49 #pragma HLS INTERFACE s_axilite port=rows bundle=CONTROL_BUS
offset=0x14
50 #pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS
offset=0x1C
51 #pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
52
53 #pragma HLS INTERFACE ap_stable port=rows
54 #pragma HLS INTERFACE ap_stable port=cols
55
56 YUV_IMAGE img_0(rows, cols);
57 YUV_IMAGE img_1(rows, cols);
58 #pragma HLS dataflow
59 hls::AXIvideo2Mat(video_in, img_0);
60 hls::Erode(img_0, img_1);
61 hls::Mat2AXIvideo(img_1, video_out);
62 }
Run C simulation to verify the algorithm:
$ make csim
It will build a test to run hls::Erode to generate an output image to compare with the golden
image generated by OpenCV function cvErode. See “Test passed!” to verify that the two
images are exactly the same. It is also recommended to view the output images to verify the
result of erode.

Step 2. Build OpenCV application for ARM

To cross-build ARM applications on host, the ARM GNU tools must be installed. The ARM GNU
tools are included with the Xilinx Software Development Kit (SDK). For this design, run the
following command rule to build the ARM application:
$ make elf
Once the build is done, the ARM executable video_cmd is at
apps/erode/software/xsdk/video_cmd/bin/video_cmd. The ARM application has
options to run OpenCV erode on processor.

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 12

Steps to Accelerate an OpenCV Application

Step 3. Run Vivado HLS to create an IPcore

This step will use Vivado HLS to synthesize the video library functions, then create an IP core
for the next step. Run the following command to proceed.
$ make core
Note: In order to hasten this step, C/RTL co-simulation is omitted here. However, it is always
recommended to run co-simulation in a Vivado HLS design. You can run co-simulation by the following
command:
$ make cosim

Step 4. Build new system with the accelerator

The following command will copy the newly generated IP core to the hardware project, run the
FPGA implementation flow to generate a bit stream:
$ make bitstream
Next, the boot image of SD card will be generated by the bit stream file with the pre-compiled
FSBL executable and the pre-compiled U-boot executable:
$ make boot
The boot image is at: boot/BOOT.bin. At this point, both hardware and software of the new
design are ready to test on board. Finally, run the following command to generate the sd_image
for on-board testing:
$ make all
Note: You can run make all in design directory to go through Step 2-4. Once it is done, the
ready-for-use SD card image will be ready at ./sd_image.

Step 5. Test on board

Copy all the files and directories in generated sd_image/ to SD card root directory.

Board setup:
• Connect the monitor to the HDMI out port of the ZC702 board using an HDMI or HDMI/DVI
cable.
• Connect a USB Mini-B cable into the Mini USB port J17 labeled USB UART on the ZC702
board and the USB Type-A cable end into an open USB port on the host PC for UART
communications.
• Connect the power supply to the ZC702 board.
• (Optional) Connect the video source which output 1080p60 video to the HDMI in port on
FMC-IMAGEON board to enable live inputs.
• (Optional) Connect the Ethernet port on the ZC702 board to network using an RJ45
Ethernet cable.
Make sure the switches are set as shown in Figure 6, which allows the ZC702 board to boot
from the SD card:

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 13

Conclusion

X-Ref Target - Figure 6

Figure 6: Switches Setting of SW16 on ZC702 Board

Open a terminal program (e.g. TeraTerm), set Baud Rate = 115200, Date bits = 8, Parity =
None, Stop Bits = 1, and Flow control = None.
Insert the SD card which contains sd_image contents to the SD slot on the ZC702 board.
Switch the board power on, wait for system brought up login with root/root.
Run the application in command line mode:
root@zynq:~# run_video.sh -cmd

Conclusion OpenCV is a useful framework for developing computer vision designs. OpenCV applications
can be also used in embedded systems by recompiling them for the ARM architecture and
executing them in Zynq devices. Additionally, by leveraging the synthesizable video libraries in
Vivado HLS, OpenCV applications can be accelerated to process high-definition video in
real-time.

Reference 1. www.opencv.org
2. AXI Reference Guide (UG761)
3. Video Timing Controller Product Guide (pg016)
4. Vivado Design Suite User Guide: High-Level Synthesis (UG902)
5. Zynq Base TRD (UG925)
6. Vivado HLS web page: www.xilinx.com/hls

Revision The following table shows the revision history for this document.
History
Date Version Description of Revisions
3/20/13 v1.0 Initial Xilinx release.
7/23/13 v2.0 Updated sections to reflect current release.
04/30/15 v3.0 Updated Reference Designs and other sections to reflect the current
release.

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 14

View publication stats

Xapp1167-Accelerating OpenCV Applications With Zynq-7000 All Programmable SoC Using Vivado HLS Video Libraries
No ratings yet
Xapp1167-Accelerating OpenCV Applications With Zynq-7000 All Programmable SoC Using Vivado HLS Video Libraries
14 pages
Using OpenCV and Vivado™ HLS To Accelerate Embedded Vision Applications in The Zynq SoC
No ratings yet
Using OpenCV and Vivado™ HLS To Accelerate Embedded Vision Applications in The Zynq SoC
6 pages
Programming Vision Applications On Zynq Using Opencv and High-Level Synthesis
No ratings yet
Programming Vision Applications On Zynq Using Opencv and High-Level Synthesis
11 pages
Accelerating Image Processing On FPGAs Using HLS and PYNQ
No ratings yet
Accelerating Image Processing On FPGAs Using HLS and PYNQ
2 pages
Hardware Accelerated Image Processing On FPGA Base
No ratings yet
Hardware Accelerated Image Processing On FPGA Base
4 pages
Week 13 Summary
No ratings yet
Week 13 Summary
11 pages
Week 13 Summary
No ratings yet
Week 13 Summary
9 pages
Fpga Acceleration Lfric Weather and Climate Model Euroexa Project Using Vivado Hls
No ratings yet
Fpga Acceleration Lfric Weather and Climate Model Euroexa Project Using Vivado Hls
27 pages
Etasr 4615
No ratings yet
Etasr 4615
6 pages
Vaidya 2017 Hardware Acceleration of Image Proc
No ratings yet
Vaidya 2017 Hardware Acceleration of Image Proc
6 pages
Ug1165 Zynq Embedded Design Tutorial 1
No ratings yet
Ug1165 Zynq Embedded Design Tutorial 1
136 pages
Vivado HLS Update
No ratings yet
Vivado HLS Update
35 pages
Class Notes
No ratings yet
Class Notes
10 pages
1 en 9 Chapter
No ratings yet
1 en 9 Chapter
21 pages
Hardware Acceleration of Video Processing PDF
No ratings yet
Hardware Acceleration of Video Processing PDF
52 pages
Hardware Acceleration of Image and Video Processin
No ratings yet
Hardware Acceleration of Image and Video Processin
9 pages
Week 12 Summary
No ratings yet
Week 12 Summary
7 pages
04 Abstract
No ratings yet
04 Abstract
40 pages
1 s2.0 S1383762121002575 Main
No ratings yet
1 s2.0 S1383762121002575 Main
10 pages
MPSOC2012
No ratings yet
MPSOC2012
25 pages
FPGA & VLSI Expert Profile
No ratings yet
FPGA & VLSI Expert Profile
16 pages
Basic Embedded System Design Tutorial-2022.2
No ratings yet
Basic Embedded System Design Tutorial-2022.2
143 pages
Xapp792 High Performance Video Zynq
No ratings yet
Xapp792 High Performance Video Zynq
15 pages
Zynq7000 Embedded Design Tutorial
No ratings yet
Zynq7000 Embedded Design Tutorial
126 pages
Design of Soc Based Platform & Development of Software For Video Display Application
No ratings yet
Design of Soc Based Platform & Development of Software For Video Display Application
4 pages
Multimedia User Guide
No ratings yet
Multimedia User Guide
99 pages
Machine-Vision FPGA Kria
No ratings yet
Machine-Vision FPGA Kria
18 pages
Outline
No ratings yet
Outline
1 page
Run Fast With Vivado HLS
No ratings yet
Run Fast With Vivado HLS
4 pages
Zynq-7000 All Programmable SoC - Embedded Design Tutorial. A Hands-On Guide To Effective Embedded System Design
No ratings yet
Zynq-7000 All Programmable SoC - Embedded Design Tutorial. A Hands-On Guide To Effective Embedded System Design
124 pages
Al-Naqshbndi, S. MSC Thesis
No ratings yet
Al-Naqshbndi, S. MSC Thesis
85 pages
Video/Image Processing On FPGA
No ratings yet
Video/Image Processing On FPGA
94 pages
Embedded Systems Design Syllabus
No ratings yet
Embedded Systems Design Syllabus
20 pages
Zynq Zc702 Reference Design
No ratings yet
Zynq Zc702 Reference Design
40 pages
06 From AMD Zynq US+ MPSoC - To - RFSoC - v02
No ratings yet
06 From AMD Zynq US+ MPSoC - To - RFSoC - v02
36 pages
SoC ADV Embedded Advanced 2023
No ratings yet
SoC ADV Embedded Advanced 2023
3 pages
Pantech Project Titles VLSI Projects 2017-18
No ratings yet
Pantech Project Titles VLSI Projects 2017-18
4 pages
Zynq Ultrascale Plus Product Brief
No ratings yet
Zynq Ultrascale Plus Product Brief
6 pages
Zynq FPGA Based System Design For Video Surveillance With Sobel Edge Detection
No ratings yet
Zynq FPGA Based System Design For Video Surveillance With Sobel Edge Detection
4 pages
Ug1233 Xilinx Opencv User Guide
No ratings yet
Ug1233 Xilinx Opencv User Guide
302 pages
Jimaging 05 00016
No ratings yet
Jimaging 05 00016
22 pages
Benini ISC2023 Paving The Road For Riscv
No ratings yet
Benini ISC2023 Paving The Road For Riscv
40 pages
Ug1221 Zcu102 Base TRD PDF
No ratings yet
Ug1221 Zcu102 Base TRD PDF
88 pages
B1 Report
No ratings yet
B1 Report
107 pages
Vysoké Učení Technické V Brně: HW/SW Co-Design For The Xilinx Zynq Platform
No ratings yet
Vysoké Učení Technické V Brně: HW/SW Co-Design For The Xilinx Zynq Platform
70 pages
Programming Heterogeneous Systems From An Image Processing DSL
No ratings yet
Programming Heterogeneous Systems From An Image Processing DSL
25 pages
Week 13
No ratings yet
Week 13
19 pages
High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper
No ratings yet
High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper
8 pages
Pynq Hls Zcu104 Pzu
No ratings yet
Pynq Hls Zcu104 Pzu
69 pages
Abdo Khaled
No ratings yet
Abdo Khaled
2 pages
FPGA-Based Hardware Acceleration Using PYNQ-Z2
No ratings yet
FPGA-Based Hardware Acceleration Using PYNQ-Z2
4 pages
Unofficial Document
No ratings yet
Unofficial Document
99 pages
Implementation and Optimization of Embedded Image Processing System
No ratings yet
Implementation and Optimization of Embedded Image Processing System
6 pages
Zynq Soc
No ratings yet
Zynq Soc
42 pages
Ambarella CV72S Product Brief
No ratings yet
Ambarella CV72S Product Brief
2 pages
Resume Spring 2024
No ratings yet
Resume Spring 2024
1 page
Towards Flexible Hardware - Software Encoding Using H.264
No ratings yet
Towards Flexible Hardware - Software Encoding Using H.264
111 pages
FPGA Video Processing for Non-Experts
No ratings yet
FPGA Video Processing for Non-Experts
12 pages
CMP202 Lecture 1 - Principles of Good Programming
No ratings yet
CMP202 Lecture 1 - Principles of Good Programming
34 pages
ELS 16 Agustus 2023
No ratings yet
ELS 16 Agustus 2023
18 pages
Log Cat 1708710391474
No ratings yet
Log Cat 1708710391474
82 pages
Teledyne MFDAU Brochure (6) - 240125 - 163750
No ratings yet
Teledyne MFDAU Brochure (6) - 240125 - 163750
3 pages
Creating A Managed Wrapper For A Lib File
No ratings yet
Creating A Managed Wrapper For A Lib File
11 pages
XINJE XC Series: HMI Setting
No ratings yet
XINJE XC Series: HMI Setting
3 pages
Module 1 PDF
No ratings yet
Module 1 PDF
79 pages
Ultrasharp U2718q
No ratings yet
Ultrasharp U2718q
56 pages
CS102 Handbook
No ratings yet
CS102 Handbook
72 pages
EquIP Quick Reference Guide
No ratings yet
EquIP Quick Reference Guide
6 pages
Automate SFTP Expect Script
No ratings yet
Automate SFTP Expect Script
4 pages
Local Area Network Group 2 Report Final
No ratings yet
Local Area Network Group 2 Report Final
7 pages
ABB IRB 6400 Calibration Manual
No ratings yet
ABB IRB 6400 Calibration Manual
12 pages
GA-HA65M-D2H-B3: User's Manual
No ratings yet
GA-HA65M-D2H-B3: User's Manual
104 pages
Intel MP 8086
No ratings yet
Intel MP 8086
31 pages
Vsdsquadron Specs v1
No ratings yet
Vsdsquadron Specs v1
23 pages
Microcontrollers TSD DS Jan-2006 REV
No ratings yet
Microcontrollers TSD DS Jan-2006 REV
2 pages
DeckLink Manual
No ratings yet
DeckLink Manual
54 pages
Vehicle-PC Adapter Setup Guide
No ratings yet
Vehicle-PC Adapter Setup Guide
6 pages
MAD 22617 MSBTE Practical Exam Program List
No ratings yet
MAD 22617 MSBTE Practical Exam Program List
3 pages
Module-3 Memory Management
No ratings yet
Module-3 Memory Management
10 pages
19c RAC On VirtualBox
No ratings yet
19c RAC On VirtualBox
144 pages
Question Bank CPIT 210
No ratings yet
Question Bank CPIT 210
8 pages
Sample Eletter
No ratings yet
Sample Eletter
2 pages
General Information: (No Longer Manufactured)
No ratings yet
General Information: (No Longer Manufactured)
4 pages
Cluster Computer
No ratings yet
Cluster Computer
22 pages
ICT Mock Exam Guide
No ratings yet
ICT Mock Exam Guide
9 pages
Os Lab New-1
No ratings yet
Os Lab New-1
62 pages
Readme Ps2
No ratings yet
Readme Ps2
4 pages
Installing, Configuring, and Troubleshooting Storage Devices
No ratings yet
Installing, Configuring, and Troubleshooting Storage Devices
99 pages

Xapp 1167

Uploaded by

Xapp 1167

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using

Article · January 2013

54 PUBLICATIONS 3,846 CITATIONS

The user has requested enhancement of the downloaded file.

Accelerating OpenCV Applications with

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 1

X-Ref Target - Figure 1

2SHQ&9IXQFWLRQ 2SHQ&9IXQFWLRQ +/6YLGHROLEUDU\ 6\QWKHVL]HG)3*$

Figure 1: Design Flow

Reference The reference design files can be downloaded from:

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 2

Table 1: Reference Design Matrix

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 3

cvScale(src, dst, 2.0, 0.0);

,D/ 9LGHR 9LGHR ,D/

Figure 2: Direct Streaming Architecture for Video Processing

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 4

X-Ref Target - Figure 3

Figure 3: Frame-buffer Architecture for Video Processing

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 5

Table 2: Vivado HLS Synthesizable Video Functions

Table 3: Vivado HLS Non-Synthesizable Video Functions

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 6

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 7

50 #pragma HLS INTERFACE s_axilite port=cols bundle=CONTROL_BUS

Figure 4: Demo Design Block Diagram

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 8

58 IMAGE_C2 img_2(rows, cols);

Figure 5: Fast-Corners Application

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 9

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 10

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 11

Note: If FMC-IMAGEON board is used, please modify line 95 apps/common/configure.mk to

Step 1. Create new design

Step 2. Build OpenCV application for ARM

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 12

Step 3. Run Vivado HLS to create an IPcore

Step 4. Build new system with the accelerator

Step 5. Test on board

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 13

X-Ref Target - Figure 6

Figure 6: Switches Setting of SW16 on ZC702 Board

XAPP1167 (v3.0) April 30, 2015 www.xilinx.com 14

View publication stats

You might also like

,D/ 9LGHR 9LGHR ,D/