Ug1233 Xilinx Opencv User Guide
Ug1233 Xilinx Opencv User Guide
Revision History
The following table shows the revision history for this document.
Chapter 1: Overview......................................................................................................5
Basic Features.............................................................................................................................. 5
xfOpenCV Kernel on the reVISION Platform............................................................................6
xfOpenCV Library Contents........................................................................................................8
Overview
This document describes the FPGA device optimized xfOpenCV library, called the Xilinx®
xfOpenCV library and is intended for application developers using Zynq®-7000 SoC and Zynq®
UltraScale+™ MPSoC and PCIE based (Virtex and U200 ...) devices. xfOpenCV library has been
designed to work in the SDx™ development environment, and provides a software interface for
computer vision functions accelerated on an FPGA device. xfOpenCV library functions are
mostly similar in functionality to their OpenCV equivalent. Any deviations, if present, are
documented.
Note: For more information on the xfOpenCV library prerequisites, see the Prerequisites. To familiarize
yourself with the steps required to use the xfOpenCV library functions, see the Using the xfOpenCV
Library.
Basic Features
All xfOpenCV library functions follow a common format. The following properties hold true for
all the functions.
• All the functions are designed as templates and all arguments that are images, must be
provided as xf::Mat.
• All functions are defined in the xf namespace.
• Some of the major template arguments are:
○ Maximum size of the image to be processed
The xfOpenCV library contains enumerated datatypes which enables you to configure xf::Mat.
For more details on xf::Mat, see the xf::Mat Image Container Class.
The following steps describe the general flow of an example design, where both the input and
the output are image files.
The entire code is written as the host code for the pipeline , from which all the calls to xfOpenCV
functions are moved to hardware. Functions from xfOpenCV are used to read and write images
in the memory. The image containers for xfOpenCV library functions are xf::Mat objects. For
more information, see the xf::Mat Image Container Class.
The reVISION platform supports both live and file input-output (I/O) modes. For more details, see
the reVISION Getting Started Guide.
• File I/O mode enables the controller to transfer images from SD Card to the hardware kernel.
The following steps describe the file I/O mode.
1. Processing system (PS) reads the image frame from the SD Card and stores it in the
DRAM.
2. The xfOpenCV kernel reads the image from the DRAM, processes it and stores the output
back in the DRAM memory.
3. The PS reads the output image frame from the DRAM and writes it back to the SD Card.
• Live I/O mode enables streaming frames into the platform, processing frames with the
xfOpenCV kernel, and streaming out the frames through the appropriate interface. The
following steps describe the live I/O mode.
1. Video capture IPs receive a frame and store it in the DRAM.
2. The xfOpenCV kernel fetches the image from the DRAM, processes the image, and stores
the output in the DRAM.
3. Display IPs read the output frame from the DRAM and transmits the frame through the
appropriate display interface.
Following figure shows the reVISION platform with the xfOpenCV kernel block:
ARM Core
DDR
Controller
Central Interconnect
AXIS AXIMM
xfOpenCV Kernel
X22064-113018
Note: For more information on the PS-PL interfaces and PL-DDR interfaces, see the Zynq UltraScale+ Device
Technical Reference Manual (UG1085).
Folder Details
include Contains the header files required by the library.
include/common Contains the common library infrastructure headers, such
as types specific to the library.
include/core Contains the core library functionality headers, such as the
math functions.
include/features Contains the feature extraction kernel function definitions.
For example, Harris.
include/imgproc Contains all the kernel function definitions, except the ones
available in the features folder.
include/video Contains all the kernel function definitions, except the ones
available in the features and imgproc folder.
examples Contains the sample test bench code to facilitate running
unit tests. The examples/ folder contains the folders with
algorithm names. Each algorithm folder contains host
files, .json file, and data folder. For more details on how to
use the xfOpenCV library, see xfOpenCV Kernel on the
reVISION Platform.
examples Contains the sample test bench code for 24 functions, which
shows how to use xfOpenCV library in SDAccel™
environment.
HLS_Use_Model Contains examples for using xfOpenCV functions in
Standalone Vivado HLS in 2 different modes.
HLS_Use_Model/Standalone_HLS_Example Contains sample code and tcl script for synthesizing
xfOpenCV functions as is, in Standalone Vivado HLS tool.
HLS_Use_Model/Standalone_HLS_AXI_Example Contains sample code and tcl script for synthesizing
functions with AXI interfaces, in Standalone Vivado HLS tool.
Prerequisites
This section lists the prerequisites for using the xfOpenCV library functions on ZCU104 based
platforms. The methodology holds true for ZC702 and ZC706 reVISION platforms as well.
• Download and install the SDx development environment according to the directions provided
in SDSoC Environments Release Notes, Installation, and Licensing Guide (UG1294). Before
launching the SDx development environment on Linux, set the $SYSROOT environment
variable to point to the Linux root file system if using terminal to build project, delivered with
the reVISION platform. For example:
export SYSROOT = <local folder>/zcu104_rv_ss/sw/a53_linux/a53_linux/
sysroot/aarch64-xilinx-xilinx
• Download the Zynq® UltraScale+™ MPSoC Embedded Vision Platform zip file and extract its
contents. Create the SDx development environment workspace in the zcu104_rv_ss folder
of the extracted design file hierarchy. For more details, see the reVISION Getting Started
Guide.
• Set up the ZCU104 evaluation board. For more details, see the reVISION Getting Started
Guide.
• Download the xfOpenCV library. This library is made available through github. Run the
following git clone command to clone the xfOpenCV repository to your local disk:
git clone https://github.com/Xilinx/xfopencv.git
This section provides the details on using the C++ video processing functions and the
infrastructure present in HLS video library.
All the functions imported from HLS video library now take xf::Mat (in sync with xfOpenCV
library) to represent image data instead of hls::Mat. The main difference between these two is
that the hls::Mat uses hls::stream to store the data whereas xf::Mat uses a pointer. Therefore,
hls:: Mat cannot be exactly replaced with xf::Mat for migrating.
Below table summarizes the differences between member functions of hls::Mat to xf::Mat.
Classes
• Memory Line Buffer: hls::LineBuffer is now xf::LineBuffer. No difference between the two,
except xf::LineBuffer has extra template arguments for inferring different types of RAM
structures, for the storage structure used. Default storage type is “RAM_S2P_BRAM” with
RESHAPE_FACTOR=1. Complete description can be found here xf::LineBuffer. This is located
in xf_video_mem.h file.
Funtions
• OpenCV interface functions: These functions covert image data of OpenCV Mat format to/
from HLS AXI types. HLS Video Library had 14 interface functions, out of which, two
functions are available in xfOpenCV Library: cvMat2AXIvideo and AXIvideo2cvMat located in
“xf_axi.h” file. The rest are all deprecated.
• AXI4-Stream I/O Functions: The I/O functions which convert hls::Mat to/from AXI4-Stream
compatible data type (hls::stream) are hls::AXIvideo2Mat, hls::Mat2AXIvideo. These functions
are now deprecated and added 2 new functions xf::AXIvideo2xfMat and xf:: xfMat2AXIvideo
to facilitate the xf::Mat to/from conversion. To use these functions, the header file "xf_infra.h"
must be included.
xf::window
A template class to represent the 2D window buffer. It has three parameters to specify the
number of rows, columns in window buffer and the pixel data type.
Class definition
Parameter Descriptions
The following table lists the xf::Window class members and their descriptions.
Parameter Description
Function Description
shift_pixels_left() Shift the window left, that moves all stored data within the
window right, leave the leftmost column (col = COLS-1) for
inserting new data.
shift_pixels_right() Shift the window right, that moves all stored data within the
window left, leave the rightmost column (col = 0) for
inserting new data.
shift_pixels_up() Shift the window up, that moves all stored data within the
window down, leave the top row (row = ROWS-1) for
inserting new data.
shift_pixels_down() Shift the window down, that moves all stored data within
the window up, leave the bottom row (row = 0) for inserting
new data.
insert_pixel(T value, int row, int col) Insert a new element value at location (row, column) of the
window.
insert_row(T value[COLS], int row) Inserts a set of values in any row of the window.
insert_top_row(T value[COLS]) Inserts a set of values in the top row = 0 of the window.
insert_bottom_row(T value[COLS]) Inserts a set of values in the bottom row = ROWS-1 of the
window.
insert_col(T value[ROWS], int col) Inserts a set of values in any column of the window.
insert_left_col(T value[ROWS]) Inserts a set of values in left column = 0 of the window.
insert_right_col(T value[ROWS]) Inserts a set of values in right column = COLS-1 of the
window.
T& getval(int row, int col) Returns the data value in the window at position
(row,column).
T& operator ()(int row, int col) Returns the data value in the window at position
(row,column).
restore_val() Restore the contents of window buffer to another array.
window_print() Print all the data present in window buffer onto console.
Parameter Description
Parameter Description
xf::LineBuffer
A template class to represent 2D line buffer. It has three parameters to specify the number of
rows, columns in window buffer and the pixel data type.
Class definition
Parameter Descriptions
The following table lists the xf::LineBuffer class members and their descriptions.
Parameter Description
Function Description
shift_pixels_up(int col) Line buffer contents Shift up, new values will be placed in
the bottom row=ROWS-1.
shift_pixels_down(int col) Line buffer contents Shift down, new values will be placed in
the top row=0.
insert_bottom_row(T value, int col) Inserts a new value in bottom row= ROWS-1 of the line
buffer.
insert_top_row(T value, int col) Inserts a new value in top row=0 of the line buffer.
get_col(T value[ROWS], int col) Get a column value of the line buffer.
T& getval(int row, int col) Returns the data value in the line buffer at position (row,
column).
T& operator ()(int row, int col); Returns the data value in the line buffer at position (row,
column).
Parameter Description
addS
template<int ROWS, int COLS, int SRC_T, template<int POLICY_TYPE, int SRC_T,
typename _T, int DST_T> int ROWS, int COLS, int NPC =1>
void AddS(Mat<ROWS, COLS, void addS(xf::Mat<SRC_T, ROWS, COLS,
SRC_T>&src,Scalar<HLS_MAT_CN(SRC_T), NPC> & _src1, unsigned char
_T> scl, Mat<ROWS, COLS, DST_T>& dst) _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
_T, ROWS, COLS, NPC> & _dst)
AddWeighted
template<int ROWS, int COLS, int template< int SRC_T,int DST_T, int
SRC1_T, int SRC2_T, int DST_T, typename ROWS, int COLS, int NPC = 1>
P_T> void addWeighted(xf::Mat<SRC_T, ROWS,
void AddWeighted(Mat<ROWS, COLS, COLS, NPC> & src1,float alpha,
SRC1_T>& src1,P_T alpha,Mat<ROWS, COLS, xf::Mat<SRC_T, ROWS, COLS, NPC> &
SRC2_T>& src2,P_T beta, P_T src2,float beta, float gama,
gamma,Mat<ROWS, COLS, DST_T>& dst) xf::Mat<DST_T, ROWS, COLS, NPC> & dst)
Cmp
template<int ROWS, int COLS, int template<int CMP_OP, int SRC_T, int
SRC1_T, int SRC2_T, int DST_T> ROWS, int COLS, int NPC =1>
void Cmp(Mat<ROWS, COLS, SRC1_T>& void compare(xf::Mat<SRC_T, ROWS, COLS,
src1,Mat<ROWS, COLS, SRC2_T>& src2, NPC> & _src1, xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, DST_T>& dst,int cmp_op) COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst)
CmpS
template<int ROWS, int COLS, int SRC_T, template<int CMP_OP, int SRC_T, int
typename P_T, int DST_T> ROWS, int COLS, int NPC =1>
void CmpS(Mat<ROWS, COLS, SRC_T>& src, void compare(xf::Mat<SRC_T, ROWS, COLS,
P_T value, Mat<ROWS, COLS, DST_T>& dst, NPC> & _src1, unsigned char
int cmp_op) _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
_T, ROWS, COLS, NPC> & _dst)
Max
template<int ROWS, int COLS, int template<int SRC_T, int ROWS, int COLS,
SRC1_T, int SRC2_T, int DST_T> int NPC =1>
void Max(Mat<ROWS, COLS, SRC1_T>& src1, void Max(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC2_T>& src2, NPC> & _src1, xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, DST_T>& dst) COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst)
MaxS
template<int ROWS, int COLS, int SRC_T, template< int SRC_T, int ROWS, int
typename _T, int DST_T> COLS, int NPC =1>
void MaxS(Mat<ROWS, COLS, SRC_T>& src, void max(xf::Mat<SRC_T, ROWS, COLS,
_T value, Mat<ROWS, COLS, DST_T>& dst) NPC> & _src1, unsigned char
_scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
_T, ROWS, COLS, NPC> & _dst)
Min
template<int ROWS, int COLS, int template< int SRC_T, int ROWS, int
SRC1_T, int SRC2_T, int DST_T> COLS, int NPC =1>
void Min(Mat<ROWS, COLS, SRC1_T>& src1, void Min(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC2_T>& src2, NPC> & _src1, xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, DST_T>& dst) COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst)
MinS
template<int ROWS, int COLS, int SRC_T, template< int SRC_T, int ROWS, int
typename _T, int DST_T> COLS, int NPC =1>
void MinS(Mat<ROWS, COLS, SRC_T>& src, void min(xf::Mat<SRC_T, ROWS, COLS,
_T value,Mat<ROWS, COLS, NPC> & _src1, unsigned char
DST_T>& dst) _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
_T, ROWS, COLS, NPC> & _dst)
PaintMask
template<int SRC_T,int MASK_T,int template< int SRC_T,int MASK_T, int
ROWS,int COLS> ROWS, int COLS,int NPC=1>
void PaintMask( void paintmask(xf::Mat<SRC_T, ROWS,
Mat<ROWS,COLS,SRC_T> &_src, COLS, NPC> & _src_mat, xf::Mat<MASK_T,
Mat<ROWS,COLS,MASK_T>&_mask, ROWS, COLS, NPC> & in_mask,
Mat<ROWS,COLS,SRC_T>&_dst,Scalar<HLS_MAT xf::Mat<SRC_T, ROWS, COLS, NPC> &
_CN(SRC_T),HLS_TNAME(SRC_T)> _color) _dst_mat, unsigned char
_color[XF_CHANNELS(SRC_T,NPC)])
Reduce
template<typename INTER_SUM_T, int template< int REDUCE_OP, int SRC_T,int
ROWS, int COLS, int SRC_T, int DST_T, int ROWS, int COLS,int
DST_ROWS, int DST_COLS, int DST_T> ONE_D_HEIGHT, int ONE_D_WIDTH, int
void Reduce( NPC=1>
Mat<ROWS, COLS, SRC_T> &src, void reduce(xf::Mat<SRC_T, ROWS, COLS,
Mat<DST_ROWS, DST_COLS, DST_T> NPC> & _src_mat, xf::Mat<DST_T,
&dst, ONE_D_HEIGHT, ONE_D_WIDTH, 1> &
int dim, _dst_mat, unsigned char dim)
int op=HLS_REDUCE_SUM)
Zero
template<int ROWS, int COLS, int SRC_T, template< int SRC_T, int ROWS, int
int DST_T> COLS, int NPC =1>
void Zero(Mat<ROWS, COLS, SRC_T>& src, void zero(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, DST_T>& dst) NPC> & _src1,xf::Mat<SRC_T, ROWS, COLS,
NPC> & _dst)
Sum
template<typename DST_T, int ROWS, int template< int SRC_T, int ROWS, int
COLS, int SRC_T> COLS, int NPC = 1>
Scalar<HLS_MAT_CN(SRC_T), DST_T> Sum( void sum(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC_T>& src) NPC> & src1, double
sum[XF_CHANNELS(SRC_T,NPC)] )
SubS
template<int ROWS, int COLS, int SRC_T, template<int POLICY_TYPE, int SRC_T,
typename _T, int DST_T> int ROWS, int COLS, int NPC =1>
void SubS(Mat<ROWS, COLS, SRC_T>& src, void SubS(xf::Mat<SRC_T, ROWS, COLS,
Scalar<HLS_MAT_CN(SRC_T), _T> NPC> & _src1, unsigned char
scl, _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
Mat<ROWS, COLS, DST_T>& dst) _T, ROWS, COLS, NPC> & _dst)
SubRS
template<int ROWS, int COLS, int SRC_T, template<int POLICY_TYPE, int SRC_T,
typename _T, int DST_T> int ROWS, int COLS, int NPC =1>
void SubRS(Mat<ROWS, COLS, SRC_T>& src, void SubRS(xf::Mat<SRC_T, ROWS, COLS,
Scalar<HLS_MAT_CN(SRC_T), _T> NPC> & _src1, unsigned char
scl, _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
Mat<ROWS, COLS, DST_T>& dst) _T, ROWS, COLS, NPC> & _dst)
Set
template<int ROWS, int COLS, int SRC_T, template< int SRC_T, int ROWS, int
typename _T, int DST_T> COLS, int NPC =1>
void Set(Mat<ROWS, COLS, SRC_T>& src, void set(xf::Mat<SRC_T, ROWS, COLS,
Scalar<HLS_MAT_CN(SRC_T), _T> NPC> & _src1, unsigned char
scl, _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC
Mat<ROWS, COLS, DST_T>& dst) _T, ROWS, COLS, NPC> & _dst)
Absdiff
template<int ROWS, int COLS, int template<int SRC_T, int ROWS, int COLS,
SRC1_T, int SRC2_T, int DST_T> int NPC =1>
void AbsDiff( void absdiff(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC1_T>& src1, NPC> & _src1,xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC2_T>& src2, NPC> & _src2,xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, DST_T>& dst) NPC> & _dst)
And
template<int ROWS, int COLS, int template<int SRC_T, int ROWS, int COLS,
SRC1_T, int SRC2_T, int DST_T> int NPC = 1>
void And( void bitwise_and(xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, SRC1_T>& src1, COLS, NPC> & _src1, xf::Mat<SRC_T,
Mat<ROWS, COLS, SRC2_T>& src2, ROWS, COLS, NPC> & _src2,
Mat<ROWS, COLS, DST_T>& dst) xf::Mat<SRC_T, ROWS, COLS, NPC> &_dst)
Dilate
template<int Shape_type,int template<int BORDER_TYPE, int TYPE, int
ITERATIONS,int SRC_T, int DST_T, ROWS, int COLS,int K_SHAPE,int
typename KN_T,int IMG_HEIGHT,int K_ROWS,int K_COLS, int ITERATIONS, int
IMG_WIDTH,int K_HEIGHT,int K_WIDTH> NPC=1>
void Dilate(Mat<IMG_HEIGHT, IMG_WIDTH, void dilate (xf::Mat<TYPE, ROWS, COLS,
SRC_T>&_src,Mat<IMG_HEIGHT, IMG_WIDTH, NPC> & _src, xf::Mat<TYPE, ROWS, COLS,
DST_T&_dst,Window<K_HEIGHT,K_WIDTH,KN_T> NPC> & _dst,unsigned char
&_kernel) _kernel[K_ROWS*K_COLS])
Duplicate
template<int ROWS, int COLS, int SRC_T, template<int SRC_T, int ROWS, int
int DST_T> COLS,int NPC>
void Duplicate(Mat<ROWS, COLS, SRC_T>& void duplicateMat(xf::Mat<SRC_T, ROWS,
src,Mat<ROWS, COLS, DST_T>& COLS, NPC> & _src, xf::Mat<SRC_T, ROWS,
dst1,Mat<ROWS, COLS, DST_T>& dst2) COLS, NPC> & _dst1,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst2)
EqualizeHist
template<int SRC_T, int DST_T,int ROW, template<int SRC_T, int ROWS, int COLS,
int COL> int NPC = 1>
void EqualizeHist(Mat<ROW, COL, void equalizeHist(xf::Mat<SRC_T, ROWS,
SRC_T>&_src,Mat<ROW, COL, DST_T>&_dst) COLS, NPC> & _src,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _src1,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst)
erode
template<int Shape_type,int template<int BORDER_TYPE, int TYPE, int
ITERATIONS,int SRC_T, int DST_T, ROWS, int COLS,int K_SHAPE,int
typename KN_T,int IMG_HEIGHT,int K_ROWS,int K_COLS, int ITERATIONS, int
IMG_WIDTH,int K_HEIGHT,int K_WIDTH> NPC=1>
void Erode(Mat<IMG_HEIGHT, IMG_WIDTH, void erode (xf::Mat<TYPE, ROWS, COLS,
SRC_T>&_src,Mat<IMG_HEIGHT,IMG_WIDTH,DST NPC> & _src, xf::Mat<TYPE, ROWS, COLS,
_T>&_dst,Window<K_HEIGHT,K_WIDTH,KN_T>&_ NPC> & _dst,unsigned char
kernel) _kernel[K_ROWS*K_COLS])
FASTX
template<int SRC_T,int ROWS,int COLS> template<int NMS,int SRC_T,int ROWS,
void FASTX(Mat<ROWS,COLS,SRC_T> &_src, int COLS,int NPC=1>
Mat<ROWS,COLS,HLS_8UC1>&_mask,HLS_TNAME( void fast(xf::Mat<SRC_T, ROWS, COLS,
SRC_T)_threshold,bool _nomax_supression) NPC> & _src_mat,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _dst_mat,unsigned char
_threshold)
Filter2D
template<int SRC_T, int DST_T, typename template<int BORDER_TYPE,int
KN_T, typename POINT_T, FILTER_WIDTH,int FILTER_HEIGHT, int
int IMG_HEIGHT,int IMG_WIDTH,int SRC_T,int DST_T, int ROWS, int COLS,int
K_HEIGHT,int K_WIDTH> NPC>
void Filter2D(Mat<IMG_HEIGHT, void filter2D(xf::Mat<SRC_T, ROWS,
IMG_WIDTH, SRC_T> &_src,Mat<IMG_HEIGHT, COLS, NPC> & _src_mat,xf::Mat<DST_T,
IMG_WIDTH, DST_T> ROWS, COLS, NPC> & _dst_mat,short int
&_dst,Window<K_HEIGHT,K_WIDTH,KN_T>&_ker filter[FILTER_HEIGHT*FILTER_WIDTH],unsig
nel,Point_<POINT_T>anchor) ned char _shift)
GaussianBlur
template<int KH,int KW,typename template<int FILTER_SIZE, int
BORDERMODE,int SRC_T,int DST_T,int BORDER_TYPE, int SRC_T, int ROWS, int
ROWS,int COLS> COLS,int NPC = 1>
void GaussianBlur(Mat<ROWS, COLS, SRC_T> void GaussianBlur(xf::Mat<SRC_T, ROWS,
&_src, Mat<ROWS, COLS, DST_T> COLS, NPC> & _src, xf::Mat<SRC_T, ROWS,
&_dst,double sigmaX=0,double sigmaY=0) COLS, NPC> & _dst, float sigma)
Harris
template<int blockSize,int template<int FILTERSIZE,int BLOCKWIDTH,
Ksize,typename KT,int SRC_T,int int NMSRADIUS,int SRC_T,int ROWS, int
DST_T,int ROWS,int COLS> COLS,int NPC=1,bool USE_URAM=false>
void Harris(Mat<ROWS, COLS, void cornerHarris(xf::Mat<SRC_T, ROWS,
SRC_T> COLS, NPC> & src,xf::Mat<SRC_T, ROWS,
&_src,Mat<ROWS, COLS, DST_T>&_dst,KT COLS, NPC> & dst,uint16_t threshold,
k,int threshold uint16_t k)
CornerHarris
template<int blockSize,int template<int FILTERSIZE,int BLOCKWIDTH,
Ksize,typename KT,int SRC_T,int int NMSRADIUS,int SRC_T,int ROWS, int
DST_T,int ROWS,int COLS> COLS,int NPC=1,bool USE_URAM=false>
void CornerHarris( void cornerHarris(xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, SRC_T>&_src,Mat<ROWS, COLS, NPC> & src,xf::Mat<SRC_T, ROWS,
COLS, DST_T>&_dst,KT k) COLS, NPC> & dst,uint16_t threshold,
uint16_t k
HoughLines2
template<unsigned int theta,unsigned template<unsigned int RHO,unsigned int
int rho,typename AT,typename RT,int THETA,int MAXLINES,int DIAG,int
SRC_T,int ROW,int COL,unsigned int MINTHETA,int MAXTHETA,int SRC_T, int
linesMax> ROWS, int COLS,int NPC>
void HoughLines2(Mat<ROW,COL,SRC_T> void HoughLines(xf::Mat<SRC_T, ROWS,
&_src, COLS, NPC> & _src_mat,float
Polar_<AT,RT> (&_lines) outputrho[MAXLINES],float
[linesMax],unsigned int threshold) outputtheta[MAXLINES],short
threshold,short linesmax)
Integral
template<int SRC_T, int DST_T, template<int SRC_TYPE,int DST_TYPE, int
int ROWS,int COLS> ROWS, int COLS, int NPC>
void Integral(Mat<ROWS, COLS, void integral(xf::Mat<SRC_TYPE, ROWS,
SRC_T>&_src, COLS, NPC> & _src_mat,
Mat<ROWS+1, COLS+1, xf::Mat<DST_TYPE, ROWS, COLS, NPC> &
DST_T>&_sum ) _dst_mat)
Merge
template<int ROWS, int COLS, int SRC_T, template<int SRC_T, int DST_T, int
int DST_T> ROWS, int COLS, int NPC=1>
void Merge( void merge(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC_T>& src0, NPC> &_src1, xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC_T>& src1, NPC> &_src2, xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC_T>& src2, NPC> &_src3, xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, SRC_T>& src3, NPC> &_src4, xf::Mat<DST_T, ROWS, COLS,
Mat<ROWS, COLS, DST_T>& dst) NPC> &_dst)
MinMaxLoc
template<int ROWS, int COLS, int SRC_T, template<int SRC_T,int ROWS,int
typename P_T> COLS,int NPC=0>
void MinMaxLoc(Mat<ROWS, COLS, SRC_T>& void minMaxLoc(xf::Mat<SRC_T, ROWS,
src, COLS, NPC> & _src,int32_t *min_value,
P_T* min_val,P_T* max_val,Point& int32_t *max_value,uint16_t *_minlocx,
min_loc, uint16_t *_minlocy, uint16_t *_maxlocx,
Point& max_loc) uint16_t *_maxlocy )
Mul
template<int ROWS, int COLS, int template<int POLICY_TYPE, int SRC_T,
SRC1_T, int SRC2_T, int DST_T> int ROWS, int COLS, int NPC = 1>
void Mul(Mat<ROWS, COLS, SRC1_T>& src1, void multiply(xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, SRC2_T>& src2, COLS, NPC> & src1, xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, DST_T>& dst) COLS, NPC> & src2, xf::Mat<SRC_T, ROWS,
COLS, NPC> & dst,float scale)
Not
template<int ROWS, int COLS, int SRC_T, template<int SRC_T, int ROWS, int COLS,
int DST_T> int NPC = 1>
void Not(Mat<ROWS, COLS, SRC_T>& src, void bitwise_not(xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, DST_T>& dst) COLS, NPC> & src, xf::Mat<SRC_T, ROWS,
COLS, NPC> & dst)
Range
template<int ROWS, int COLS, int SRC_T, template<int SRC_T, int ROWS, int
int DST_T, typename P_T> COLS,int NPC=1>
void Range(Mat<ROWS, COLS, SRC_T>& src, void inRange(xf::Mat<SRC_T, ROWS, COLS,
Mat<ROWS, COLS, DST_T>& dst, NPC> & src,unsigned char
P_T start,P_T end) lower_thresh,unsigned char
upper_thresh,xf::Mat<SRC_T, ROWS, COLS,
NPC> & dst)
Resize
template<int SRC_T, int ROWS,int template<int INTERPOLATION_TYPE, int
COLS,int DROWS,int DCOLS> TYPE, int SRC_ROWS, int SRC_COLS, int
void Resize ( DST_ROWS, int DST_COLS, int NPC, int
Mat<ROWS, COLS, SRC_T> &_src, MAX_DOWN_SCALE>
Mat<DROWS, DCOLS, SRC_T> &_dst, void resize (xf::Mat<TYPE, SRC_ROWS,
int SRC_COLS, NPC> & _src, xf::Mat<TYPE,
interpolation=HLS_INTER_LINEAR ) DST_ROWS, DST_COLS, NPC> & _dst)
sobel
template<int XORDER, int YORDER, int template<int BORDER_TYPE,int
SIZE, int SRC_T, int DST_T, int FILTER_TYPE, int SRC_T,int DST_T, int
ROWS,int COLS,int DROWS,int DCOLS> ROWS, int COLS,int NPC=1,bool USE_URAM
void Sobel (Mat<ROWS, COLS, SRC_T> = false>
&_src,Mat<DROWS, DCOLS, DST_T> &_dst) void Sobel(xf::Mat<SRC_T, ROWS, COLS,
NPC> & _src_mat,xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst_matx,xf::Mat<DST_T,
ROWS, COLS, NPC> & _dst_maty)
split
template<int ROWS, int COLS, int SRC_T, template<int SRC_T, int DST_T, int
int DST_T> ROWS, int COLS, int NPC=1>
void Split( void extractChannel(xf::Mat<SRC_T,
Mat<ROWS, COLS, SRC_T>& src, ROWS, COLS, NPC> & _src_mat,
Mat<ROWS, COLS, DST_T>& dst0, xf::Mat<DST_T, ROWS, COLS, NPC> &
Mat<ROWS, COLS, DST_T>& dst1, _dst_mat, uint16_t _channel)
Mat<ROWS, COLS, DST_T>& dst2,
Mat<ROWS, COLS, DST_T>& dst3)
Threshold
template<int ROWS, int COLS, int SRC_T, template<int THRESHOLD_TYPE, int SRC_T,
int DST_T> int ROWS, int COLS,int NPC=1>
void Threshold( void Threshold(xf::Mat<SRC_T, ROWS,
Mat<ROWS, COLS, SRC_T>& src, COLS, NPC> & _src_mat,xf::Mat<SRC_T,
Mat<ROWS, COLS, DST_T>& dst, ROWS, COLS, NPC> & _dst_mat,short int
HLS_TNAME(SRC_T) thresh, thresh,short int maxval )
HLS_TNAME(DST_T) maxval,
int thresh_type)
Scale
template<int ROWS, int COLS, int SRC_T, template< int SRC_T,int DST_T, int
int DST_T, typename P_T> ROWS, int COLS, int NPC = 1>
void Scale(Mat<ROWS, COLS, SRC_T>& void scale(xf::Mat<SRC_T, ROWS, COLS,
src,Mat<ROWS, COLS, DST_T>& dst, P_T NPC> & src1, xf::Mat<DST_T, ROWS, COLS,
scale=1.0,P_T shift=0.0) NPC> & dst,float scale, float shift)
InitUndistortR
ectifyMapInve template<typename CMT, typename DT, template< int CM_SIZE, int DC_SIZE, int
rse typename ICMT, int ROWS, int COLS, int MAP_T, int ROWS, int COLS, int NPC >
MAP1_T, int MAP2_T, int N> void InitUndistortRectifyMapInverse (
void InitUndistortRectifyMapInverse ( ap_fixed<32,12> *cameraMatrix,
Window<3,3, CMT> ap_fixed<32,12> *distCoeffs,
cameraMatrix,DT(&distCoeffs) ap_fixed<32,12> *ir,
[N],Window<3,3, ICMT> ir, Mat<ROWS, xf::Mat<MAP_T, ROWS, COLS, NPC>
COLS, MAP1_T> &map1,Mat<ROWS, COLS, &_mapx_mat,xf::Mat<MAP_T, ROWS, COLS,
MAP2_T> &map2,int noRotation=false) NPC> &_mapy_mat,int _cm_size, int
_dc_size)
Avg, mean,
AvgStddev template<typename DST_T, int ROWS, int template<int SRC_T,int ROWS, int
COLS, int SRC_T> COLS,int NPC=1>void
DST_T Mean(Mat<ROWS, COLS, SRC_T>& src) meanStdDev(xf::Mat<SRC_T, ROWS, COLS,
NPC> & _src,unsigned short*
_mean,unsigned short* _stddev)
Note: All the functions except Reduce can process N-pixels per clock where N is power of 2.
Note: The instructions in this section assume that you have downloaded and installed all the required
packages. For more information, see the Prerequisites.
include folder constitutes all the necessary components to build a Computer Vision or Image
Processing pipeline using the library. The folders common and core contain the infrastructure
that the library functions need for basic functions, Mat class, and macros. The library functions
are categorized into three folders, features, video and imgproc based on the operation they
perform. The names of the folders are self-explanatory.
To work with the library functions, you need to include the path to the The xfOpenCV library is
structured as shown in the following table. The include folder in the SDx project. You can
include relevant header files for the library functions you will be working with after you source
the include folder’s path to the compiler. For example, if you would like to work with Harris
Corner Detector and Bilateral Filter, you must use the following lines in the host code:
After the headers are included, you can work with the library functions as described in the
Chapter 5: xfOpenCV Library API Reference using the examples in the examples folder as
reference.
The following table gives the name of the header file, including the folder name, which contains
the library function.
xf::accumulate imgproc/xf_accumulate_image.hpp
xf::accumulateSquare imgproc/xf_accumulate_squared.hpp
xf::accumulateWeighted imgproc/xf_accumulate_weighted.hpp
The xfOpenCV library is structured as shown in the following core/xf_arithm.hpp
table.xf::absdiff, xf::add, xf::subtract, xf::bitwise_and,
xf::bitwise_or, xf::bitwise_not,
xf::bitwise_xor,xf::multiply ,xf::Max, xf::Min, xf::compare,
xf::zero, xf::addS, xf::SubS, xf::SubRS ,xf::compareS,
xf::MaxS, xf::MinS, xf::set
xf::addWeighted imgproc/xf_add_weighted.hpp
xf::bilateralFilter imgproc/xf_histogram.hpp
xf::boxFilter imgproc/xf_box_filter.hpp
xf::boundingbox imgproc/xf_boundingbox.hpp
xf::Canny imgproc/xf_canny.hpp
xf::Colordetect imgproc/xf_colorthresholding.hpp, imgproc/
xf_bgr2hsv.hpp, imgproc/xf_erosion.hpp, imgproc/
xf_dilation.hpp
xf::merge imgproc/xf_channel_combine.hpp
xf::extractChannel imgproc/xf_channel_extract.hpp
xf::convertTo imgproc/xf_convert_bitdepth.hpp
xf::crop imgproc/xf_crop.hpp
xf::filter2D imgproc/xf_custom_convolution.hpp
xf::nv122iyuv, xf::nv122rgba, xf::nv122yuv4, xf::nv212iyuv, imgproc/xf_cvt_color.hpp
xf::nv212rgba, xf::nv212yuv4, xf::rgba2yuv4, xf::rgba2iyuv,
xf::rgba2nv12, xf::rgba2nv21, xf::uyvy2iyuv, xf::uyvy2nv12,
xf::uyvy2rgba, xf::yuyv2iyuv, xf::yuyv2nv12,
xf::yuyv2rgba,xf::rgb2iyuv,xf::rgb2nv12,xf::rgb2nv21,xf::rgb2
yuv4,xf::rgb2uyvy,xf::rgb2yuyv,xf::rgb2bgr,xf::bgr2uyvy,xf::b
gr2yuyv,xf::bgr2rgb,xf::bgr2nv12,xf::bgr2nv21,xf::iyuv2nv12,
xf::iyuv2rgba,xf::iyuv2rgb,xf::iyuv2yuv4,xf::nv122uyvy,xf::nv1
22yuyv,xf::nv122nv21,xf::nv212rgb,xf::nv212bgr,xf::nv212uyv
y,xf::nv212yuyv,xf::nv212nv12,xf::uyvy2rgb,xf::uyvy2bgr,xf::u
yvy2yuyv,xf::yuyv2rgb,xf::yuyv2bgr,xf::yuyv2uyvy,xf::rgb2gra
y,xf::bgr2gray,xf::gray2rgb,xf::gray2bgr,xf::rgb2xyz,xf::bgr2x
yz...
xf::dilate imgproc/xf_dilation.hpp
xf::demosaicing imgproc/xf_demosaicing.hpp
xf::erode imgproc/xf_erosion.hpp
xf::fast features/xf_fast.hpp
xf::GaussianBlur imgproc/xf_gaussian_filter.hpp
xf::cornerHarris features/xf_harris.hpp
xf::calcHist imgproc/xf_histogram.hpp
xf::equalizeHist imgproc/xf_hist_equalize.hpp
xf::HOGDescriptor imgproc/xf_hog_descriptor.hpp
xf::Houghlines imgproc/xf_houghlines.hpp
xf::inRange imgproc/xf_inrange.hpp
xf::integralImage imgproc/xf_integral_image.hpp
xf::densePyrOpticalFlow video/xf_pyr_dense_optical_flow.hpp
xf::DenseNonPyrLKOpticalFlow video/xf_dense_npyr_optical_flow.hpp
xf::LUT imgproc/xf_lut.hpp
xf::KalmanFilter video/xf_kalmanfilter.hpp
xf::magnitude core/xf_magnitude.hpp
xf::MeanShift imgproc/xf_mean_shift.hpp
xf::meanStdDev core/xf_mean_stddev.hpp
xf::medianBlur imgproc/xf_median_blur.hpp
xf::minMaxLoc core/xf_min_max_loc.hpp
xf::OtsuThreshold imgproc/xf_otsuthreshold.hpp
xf::phase core/xf_phase.hpp
xf::paintmask imgproc/xf_paintmask.hpp
xf::pyrDown imgproc/xf_pyr_down.hpp
xf::pyrUp imgproc/xf_pyr_up.hpp
xf::reduce imgrpoc/xf_reduce.hpp
xf::remap imgproc/xf_remap.hpp
xf::resize imgproc/xf_resize.hpp
xf::scale imgproc/xf_scale.hpp
xf::Scharr imgproc/xf_scharr.hpp
xf::SemiGlobalBM imgproc/xf_sgbm.hpp
xf::Sobel imgproc/xf_sobel.hpp
xf::StereoPipeline imgproc/xf_stereo_pipeline.hpp
xf::sum imgproc/xf_sum.hpp
xf::StereoBM imgproc/xf_stereoBM.hpp
xf::SVM imgproc/xf_svm.hpp
xf::Threshold imgproc/xf_threshold.hpp
xf::warpTransform imgproc/xf_warp_transform.hpp
The different ways to use the xfOpenCV library examples are listed below:
3. To add a library to a project, from SDx IDE, click Xilinx and select SDx Libraries.
4. Select Xilinx xfOpenCV Library and click Add to project. The dropdown menu consists of
options of which project the libraries need to be included to.
All the headers as part of the include/ folder in xfOpenCV library would be copied into the
local project directory as <project_dir>/libs/xfopencv/include. All the settings
required for the libraries to be run are also set when this action is completed.
1. Open a terminal.
2. When building for revision platform, set the environment variable SYSROOT to <the path to
platform folder>/sw/a53_linux/a53_linux/sysroot/aarch64-xilinx-linux.
3. Change the platform variable to point to the downloaded platform folder in makefile. Ensure
that the folder name of the downloaded platform is unchanged.
4. When building for revision platform , change IDIRS and LDIRS variables in the Makefile as
follows:
IDIRS = -I. -I${SYSROOT}/usr/include -I ../../include
LDIRS = --sysroot=${SYSROOT} -L=/lib -L=/usr/lib -Wl,-rpath-link=$
{SYSROOT}/lib,-rpath-link=${SYSROOT}/usr/lib
5. Change the directory to the location where you want to build the example.
cd <path to example>
8. Type the make command in the terminal. The sd_card folder is created and can be found in
the <path to example> folder.
Note: Ignore 2,4 and 6 steps when building for Non revision platforms.
1. Launch the SDx development environment using the desktop icon or the Start menu.
The Workspace Launcher dialog appears.
2. Click Browse to enter a workspace folder used to store your projects (you can use workspace
folders to organize your work), then click OK to dismiss the Workspace Launcher dialog.
Note: Before launching the SDx IDE on Linux, ensure that you use the same shell that you have used to
set the $SYSROOT environment variable. This is usually the file path to the Linux root file system.
The SDx development environment window opens with the Welcome tab visible when you
create a new workspace. The Welcome tab can be closed by clicking the X icon or minimized
if you do not wish to use it.
3. Select File → New → Xilinx SDx Project from the SDx development environment menu bar.
The New Project dialog box opens.
4. Specify the name of the project. For example Bilateral.
5. Click Next.
The the Choose Hardware Platform page appears.
6. From the Choose Hardware Platform page, click the Add Custom Platform button.
7. Browse to the directory where you extracted the reVISION platform files. Ensure that you
select the zcu104_rv_ss folder.
8. From the Choose Hardware Platform page, select zcu104_rv_ss (custom).
9. Click Next.
The Templates page appears, containing source code examples for the selected platform.
10. From the list of application templates, select bilateral - File I/O and click Finish.
11. Click the Active build configurations drop-down from the SDx Project Settings window, to
select the active configuration or create a build configuration.
The standard build configurations are Debug and Release. To get the best runtime
performance, switch to use the Release build configuration as it uses a higher compiler
optimization setting than the Debug build configuration.
12. Set the Data motion network clock frequency (MHz) to the required frequency, on the SDx
Project Settings page.
13. Right-click the project and select Build Project or press Ctrl+B keys to build the project, in
the Project Explorer view.
14. Copy the contents of the newly created sd_card folder to the SD card. The sd_card folder
contains all the files required to run designs on the ZCU104 board.
15. Insert the SD card in the ZCU104 board card slot and switch it ON.
Note: A serial port emulator (Teraterm/ minicom) is required to interface the user commands to the
board.
16. Upon successful boot, run the following command in the Teraterm terminal (serial port
emulator.)
#cd /media/card
#remount
Note: The instructions in this section assume that you have downloaded and installed all the required
packages. For more information, see the Prerequisites.
Use the following steps to import the xfOpenCV library into a SDx project and execute it on a
custom platform:
1. Launch the SDx development environment using the desktop icon or the Start menu.
The Workspace Launcher dialog appears.
2. Click Browse to enter a workspace folder used to store your projects (you can use workspace
folders to organize your work), then click OK to dismiss the Workspace Launcher dialog.
The SDx development environment window opens with the Welcome tab visible when you
create a new workspace. The Welcome tab can be closed by clicking the X icon or minimized
if you do not wish to use it.
3. Select File → New → Xilinx SDx Project from the SDx development environment menu bar.
The New Project dialog box opens.
4. Specify the name of the project. For example Test.
5. Click Next.
The the Choose Hardware Platform page appears.
6. From the Choose Hardware Platform page, select a suitable platform. For example, zcu102.
7. Click Next.
The Choose Software Platform and Target CPU page appears.
8. From the Choose Software Platform and Target CPU page, select an appropriate software
platform and the target CPU. For example, select A9 from the CPU dropdown list for ZC702
and ZC706 reVISION platforms.
9. Click Next. The Templates page appears, containing source code examples for the selected
platform.
10. From the list of application templates, select Empty Application and click Finish.
The New Project dialog box closes. A new project with the specified configuration is created.
The SDx Project Settings view appears. Notice the progress bar in the lower right border of
the view, Wait for a few moments for the C/C++ Indexer to finish.
11. The standard build configurations are Debug and Release. To get the best run-time
performance, switch to use the Release build configuration as it uses a higher compiler
optimization setting than the Debug build configuration.
12. Set the Data motion network clock frequency (MHz) to the required frequency, on the SDx
Project Settings page.
13. Select the Generate bitstream and Generate SD card image check boxes.
14. Right-click on the newly created project in the Project Explorer view.
15. From the context menu that appears, select C/C++ Build Settings.
The Properties for <project> dialog box appears.
16. Click the Tool Settings tab.
17. Expand the SDS++ Compiler → Directories tree.
19. In the same page, under SDS++ Compiler → Inferred Options → Software Platform, specify "-
hls-target 1" in the Software Platform Inferred Flags.
20. Click Apply.
21. Expand the SDS++ Linker → Libraries tree.
22. Click the icon and add the following libraries to the Libraries(-l) list. These libraries are
required by OpenCV.
• opencv_core
• opencv_imgproc
• opencv_imgcodecs
• opencv_features2d
• opencv_calib3d
• opencv_flann
• opencv_video
• opencv_videoio
23. Click the icon and add <opencv_Location>/lib folder location to the Libraries
search path (-L) list.
Note: The OpenCV library is not provided by Xilinx for custom platforms. You are required to provide
the library. Use the reVISION platform in order to use the OpenCV library provided by Xilinx.
30. Select the folder that corresponds to the library that you desire to import. For example,
accumulate.
31. Right-click the library function in the Project Explorer view and select Toggle HW/SW to
move the function to the hardware.
32. Right-click the project and select Build Project or press Ctrl+B keys to build the project, in
the Project Explorer view.
The build process may take anytime between few minutes to several hours, depending on the
power of the host machine and the complexity of the design. By far, the most time is spent
processing the routines that have been tagged for realization in hardware.
33. Copy the contents of the newly created .\<workspace>\<function>\Release
\sd_card folder to the SD card. The sd_card folder contains all the files required to run
designs on a board.
34. Insert the SD card in the board card slot and switch it ON.
Note: A serial port emulator (Teraterm/ minicom) is required to interface the user commands to the
board.
35. Upon successful boot, navigate to the ./mnt folder and run the following command at the
prompt:
#cd /mnt
Note: It is assumed that the OpenCV libraries are a port of the root filesystem. If not, add the location
of OpenCV libraries to LD_LIBRARY_PATH using the $ export LD_LIBRARY_PATH=<location
of OpenCV libraries>/lib command.
36. Run the .elf executable file. For more information, see the Using the xfOpenCV Library
Functions on Hardware.
Prerequisites
1. Valid installation of SDx™ 2019.1 or later version and the corresponding licenses.
2. Install the xfOpenCV libraries, if you intend to use libraries compiled differently than what is
provided in SDx.
3. Install the card for which the platform is supported in SDx 2019.1 or later versions.
4. Xilinx® Runtime (XRT) must be installed. XRT provides software interface to Xilinx FPGAs.
5. libOpenCL.so must be installed if not present along with the platform.
1. Loading the kernel binary on the FPGA – xcl::import_binary_file() loads the bitstream and
programs the FPGA to enable required processing of data.
2. Setting up memory buffers for data transfer – Data needs to be sent and read from the DDR
memory on the hardware. cl::Buffers are created to allocate required memory for transferring
data to and from the hardware.
3. Transfer data to and from the hardware –enqueueWriteBuffer() and enqueueReadBuffer() are
used to transfer the data to and from the hardware at the required time.
4. Execute kernel on the FPGA – There are functions to execute kernels on the FPGA. There
can be single kernel execution or multiple kernel execution that could be asynchronous or
synchronous with each other. Commonly used command is enqueueTask().
5. Profiling the performance of kernel execution – The host code in OpenCL also enables
measurement of the execution time of a kernel on the FPGA. The function used in our
examples for profiling is getProfilingInfo().
SDAccel flow (OpenCL) requires kernel interfaces to be memory pointers with width in power(s)
of 2. So glue logic is required for converting memory pointers to xf::Mat class data type and vice-
versa when interacting with xfOpenCV kernel(s). Wrapper(s) are build over the kernel(s) with this
glue logic. Below examples will provide a methodology to handle different kernel (xfOpenCV
kernels located at <Github repo>/include) types (stream and memory mapped).
extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::Mat<…> in_mat(…), out_mat(…);
#pragma HLS stream variable=in_mat.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::Array2xfMat<…> (gmem_in, in_mat);
xf::xfopencv-func<…> (in_mat, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
The above illustration assumes that the data in xf::Mat is being streamed in and streamed out.
You can also create a pipeline with multiple functions in pipeline instead of just one xfopencv
function.
For the stream based kernels with different inputs of different sizes, multiple instances of the
adapter functions are necessary. For this,
extern “C” {
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3,
ap_uint *gmem_out, ...) {
xf::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::Mat<...,HEIGHT/4,WIDTH,…> in_mat2(…), in_mat3(…);
#pragma HLS stream variable=in_mat1.data depth=2
#pragma HLS stream variable=in_mat2.data depth=2
#pragma HLS stream variable=in_mat3.data depth=2
#pragma HLS stream variable=out_mat.data depth=2
#pragma HLS dataflow
xf::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3);
xf::xfopencv-func(in_mat1, in_mat2, int_mat3, out_mat…);
xf::xfMat2Array<…> (gmem_out, out_mat);
}
}
For the stream based implementations, the data must be fetched from the input AXI and must be
pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the
same operations must be performed for the output of the xfcv kernel. To perform this, two utility
functions are provided, xf::Array2xfMat() and xf::xfMat2Array().
Array2xfMat
This function converts the input array to xf::Mat. The xfOpenCV kernel would require the input
to be of type, xf::Mat. This function would read from the array pointer and write into xf::Mat
based on the particular configuration (bit-depth, channels, pixel-parallelism) the xf::Mat was
created.
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void Array2xfMat(ap_uint< PTR_WIDTH > *srcPtr,
xf::Mat<MAT_T,ROWS,COLS,NPC>& dstMat)
Parameter Description
PTR_WIDTH Data width of the input pointer. The value must be power 2,
starting from 8 to 512.
MAT_T Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and
XF_8UC4
ROWS Maximum height of image
COLS Maximum width of image
Parameter Description
xfMat2Array
This function converts the input xf::Mat to output array. The output of the xf::kernel function will
be xf::Mat, and it will require to convert that to output pointer.
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void xfMat2Array(xf::Mat<MAT_T,ROWS,COLS,NPC>& srcMat, ap_uint< PTR_WIDTH >
*dstPtr)
Parameter Description
PTR_WIDTH Data width of the output pointer. The value must be power
2, from 8 to 512.
MAT_T Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and
XF_8UC4
ROWS Maximum height of image
COLS Maximum width of image
NPC Number of pixels computed in parallel. Example XF_NPPC1,
XF_NPPC8
dstPtr Output pointer. Type of the pointer based on the
PTR_WIDTH.
srcMat Input image of type xf::Mat
Minimum pointer widths for different configurations is shown in the following table:
Table 14: Minimum and maximum pointer widths for different mat types
Memorymapped Kernels
In the memory map based kernels such as crop, Mean-shift tracking and bounding box, the input
read will be for particular block of memory based on the requirement for the algorithm. The
streaming interfaces will require the image to be read in raster scan manner, which is not the case
for the memory mapped kernels. The methodology to handle this case is as follows:
extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::Mat<…> in_mat(…,gmem_in), out_mat(…,gmem_out);
xf::kernel<…> (in_mat, out_mat…);
}
}
The gmem pointers must be mapped to the xf::Mat objects during the object creation, and then
the memory mapped kernels are called with these mats at the interface. It is necessary that the
pointer size must be same as the size required for the xf::xfopencv-func, unlike the streaming
method where any higher size of the pointers (till 512-bits) are allowed.
Makefile
In the current use model, only a makefile based flow is provided to build applications with
xfOpenCV on SDAccel. Examples for makefile are provided in the samples section of GitHub.
Host code
The following is the Host code for the canny edge detection example. The host code sets up the
OpenCL platform with the FPGA of processing required data. In the case of xfOpenCV example,
the data is an image. Reading and writing of images are enabled using called to functions from
xfOpenCV.
// Kernel 1: Canny
std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
devices.resize(1);
cl::Program program(context, devices, bins);
cl::Kernel krnl(program,"canny_accel");
// profiling
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
diff_prof = end-start;
std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;
// Profiling Objects
cl_ulong startedge= 0;
cl_ulong endedge = 0;
double diff_prof_edge = 0.0f;
cl::Event event_sp_edge;
// profiling
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
diff_prof_edge = endedge-startedge;
std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;
xf::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat)
;
xf::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT,
WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_thresh
old);
xf::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img
_out);
}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp,
ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi port=img_inp offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi port=img_out offset=slave bundle=gmem4
}
}
Software Emulation
Software emulation is equivalent to running a C-simulation of the kernel. The time for
compilation is minimal, and is therefore recommended to be the first step in testing the kernel.
Following are the steps to build and run for the software emulation:
Hardware Emulation
Hardware emulation runs the test on the generated RTL after synthesis of the C/C++ code. The
simulation, since being done on RTL requires longer to complete when compared to software
emulation. Following are the steps to build and run for the hardware emulation:
This would consume some time since the C/C++ code must be converted to RTL, run through
synthesis and implementation process before a bitstream is created. As a prerequisite the drivers
has to be installed for corresponding DSA, for which the example was built for. Following are the
steps to run the kernel on a hardware:
$ source /opt/xilinx/xrt/setup.sh
$ export XILINX_XRT=/opt/xilinx/xrt
$ cd <path to the executable and the corresponding xclbin>
$ ./<executable> <args>
Chapter 4
You are required to do the following changes to facilitate proper functioning of the use model in
Vivado HLS 2019.1:
1. Use of appropriate compile-time options - When using the xfOpenCV functions in HLS, the -
D__SDSVHLS__ and -std=c++0x options need to be provided at the time of compilation:
2. Specifying interface pragmas to the interface level arguments - For the functions with top level
interface arguments as pointers (with more than one read/write access), the m_axi Interface
pragma must be specified. For example,
void lut_accel(xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> &imgInput,
xf::Mat<TYPE, HEIGHT, WIDTH, NPC1> &imgOutput, unsigned char *lut_ptr)
{
#pragma HLS INTERFACE m_axi depth=256 port=lut_ptr offset=direct
bundle=lut_ptr
xf::LUT< TYPE, HEIGHT, WIDTH, NPC1> (imgInput,imgOutput,lut_ptr);
}
1. In the Vivado® HLS tcl script file, update the cflags in all the add_files sections.
2. Append the path to the xfOpenCV/include directory, as it contains all the header files
required by the library.
3. Add the -D__SDSVHLS__ and -std=c++0x compiler flags.
Note: When using Vivado HLS in the Windows operating system, provide the -std=c++0x flag only for C-
Sim and Co-Sim. Do not include the flag when performing synthesis.
For example:
GUI Mode
Use the following steps to operate the HLS Standalone Mode using GUI:
Note: When using Vivado HLS in the Windows operating system, make sure to provide the -std=c+
+0x flag only for C-Sim and Co-Sim. Do not include the flag when performing synthesis.
16. Select Synthesis and repeat the above step for all the displayed files.
17. Click OK.
18. Run the C Simulation, select Clean Build and specify the required input arguments.
19. Click OK.
20. All the generated output files/images will be present in the solution1->csim->build.
21. Run C Synthesis.
22. Run Co-simulation by specifying the proper input arguments.
23. The status of co-simulation can be observed on the console.
AXIvideo2xfMat
The AXIvideo2xfMat function receives a sequence of images using the AXI4 Streaming Video
and produces an xf::Mat representation.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
xfMat2AXIvideo
The Mat2AXI video function receives an xf::Mat representation of a sequence of images and
encodes it correctly using the AXI4 Streaming video protocol.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Note: The NPC values across all the functions in a data flow must follow the same value. If there is
mismatch it throws a compilation error in HLS.
cvMat2AXIvideoxf
The cvMat2Axivideoxf function receives image as cv::Mat representation and produces the
AXI4 streaming video of image.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
AXIvideo2cvMatxf
The Axivideo2cvMatxf function receives image as AXI4 streaming video and produces the
cv::Mat representation of image
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Chapter 5
Note: The xf::Mat image container class is similar to the cv::Mat class of the OpenCV library.
Class Definition
public:
unsigned char allocatedFlag; // flag to mark memory
allocation in this class
int rows, cols, size; // actual image size
#ifdef __SDSVHLS__
typedef XF_TNAME(T,NPC) DATATYPE;
#else // When not being built for V-
HLS
typedef struct {
XF_CTUNAME(T,NPC) chnl[XF_NPIXPERCYCLE(NPC)][XF_CHANNELS(T,NPC)];
} __attribute__ ((packed)) DATATYPE;
#endif
template<int DST_T>
void convertTo (Mat<DST_T, ROWS, COLS, NPC> &dst, int otype, double
alpha=1, double beta=0);
};
Parameter Descriptions
Parameter Description
The following table lists the member functions and their descriptions:
Mat() This default constructor initializes the Mat object sizes, using the template parameters ROWS
and COLS.
Mat(int _rows, int _cols) This constructor initializes the Mat object using arguments _rows and _cols.
Mat(const xf::Mat &_src) This constructor helps clone a Mat object to another. New memory will be allocated for the
newly created constructor.
Mat(int _rows, int _cols, This constructor initializes the Mat object using arguments _rows, _cols, and _data. The *data
void *_data) member of the Mat object points to the memory allocated for _data argument, when this
constructor is used. No new memory is allocated for the *data member.
convertTo(Mat<DST_T,ROW Refer to xf::convertTo
S, COLS, NPC> &dst, int
otype, double alpha=1,
double beta=0)
copyTo(* fromData) Copies the data from Data pointer into physically contiguous memory allocated inside the
constructor.
copyFrom() Returns the pointer to the first location of the *data member.
read(int index) Readout a value from a given location and return it as a packed (for multi-pixel/clock) value.
read_float(int index) Readout a value from a given location and return it as a float value
write(int index, Writes a packed (for multi-pixel/clock) value into the given location.
XF_TNAME(T,NPC) val)
write_float(int index, float Writes a float value into the given location.
val)
type() Returns the type of the image.
depth() Returns the depth of the image
channels() Returns number of channels of the image
~Mat() This is a default destructor of the Mat object.
Template parameters of the xf::Mat class are used to set the depth of the pixel, number of
channels in the image, number of pixels packed per word, maximum number of rows and columns
of the image. The following table lists the template parameters and their descriptions:
Parameters Description
TYPE Type of the pixel data. For example, XF_8UC1 stands for 8-bit unsigned and one channel pixel.
More types can be found in include/common/xf_params.h.
Pixel-Level Parallelism
• Single-pixel processing
• Processing eight pixels in parallel
The following table describes the options available for specifying the level of parallelism required
in a particular function:
Option Description
There are two macros that are defined to work with parallelism.
○ XF_NPIXPERCYCLE(XF_NPPC2) resolves to 2
○ XF_NPIXPERCYCLE(XF_NPPC4) resolves to 4
○ XF_NPIXPERCYCLE(XF_NPPC8) resolves to 8
• The XF_BITSHIFT(flags) macro resolves to the number of times to shift the image size to
right to arrive at the final data transfer size for parallel processing.
○ XF_BITSHIFT(XF_NPPC1) resolves to 0
○ XF_BITSHIFT(XF_NPPC2) resolves to 1
○ XF_BITSHIFT(XF_NPPC4) resolves to 2
○ XF_BITSHIFT(XF_NPPC8) resolves to 3
Pixel Types
Parameter types will differ, depending on the combination of the depth of pixels and the number
of channels in the image. The generic nomenclature of the parameter is listed below.
For example, for an 8-bit pixel - unsigned - 1 channel the data type is XF_8UC1.
The following table lists the available data types for the xf::Mat class:
Option Number of bits per Pixel Unsigned/ Signed/ Float Type Number of Channels
XF_8UC1 8 Unsigned 1
XF_16UC1 16 Unsigned 1
XF_16SC1 16 Signed 1
XF_32UC1 32 Unsigned 1
XF_32FC1 32 Float 1
XF_32SC1 32 Signed 1
XF_8UC2 8 Unsigned 2
XF_8UC4 8 Unsigned 4
XF_8UC3 8 Unsigned 3
XF_2UC1 2 Unsigned 1
Based on the number of pixels to process per clock cycle and the type parameter, there are
different possible data types. The xfOpenCV library uses these datatypes for internal processing
and inside the xf::Mat class. The following are a few supported types:
• XF_TNAME(TYPE,NPPC) resolves to the data type of the data member of the xf::Mat
object. For instance, XF_TNAME(XF_8UC1,XF_NPPC8) resolves to ap_uint<64>.
Note: ap_uint<>, ap_int<>, ap_fixed<>, and ap_ufixed<> types belong to the high-level synthesis
(HLS) library. For more information, see the Vivado Design Suite User Guide: High-Level Synthesis (UG902).
Sample Illustration
The following code illustrates the configurations that are required to build the gaussian filter on
an image, using the SDSoC™ tool for Zynq® UltraScale™ platform.
Note: In case of a real-time application, where the video is streamed in, it is recommended that the location
of frame buffer is xf::Mat and is processed using the library function. The resultant location pointer is
passed to display IPs.
xf_config_params.h
#define FILTER_SIZE_3 1
#define FILTER_SIZE_5 0
#define FILTER_SIZE_7 0
#define RO 0
#define NO 1
#if NO
#define NPC1 XF_NPPC1
#endif
#if RO
#define NPC1 XF_NPPC8
#endif
xf_gaussian_filter_tb.cpp
imgInput.copyTo(in_gray.data);
gaussian_filter_accel(imgInput,imgOutput,sigma);
xf_gaussian_filter_accel.cpp
#include "xf_gaussian_filter_config.h"
void gaussian_filter_accel(xf::Mat<XF_8UC1,HEIGHT,WIDTH,NPC1>
&imgInput,xf::Mat<XF_8UC1,HEIGHT,WIDTH,NPC1>&imgOutput,float sigma)
{
xf::GaussianBlur<FILTER_WIDTH, XF_BORDER_CONSTANT, XF_8UC1, HEIGHT,
WIDTH, NPC1>(imgInput, imgOutput, sigma);
}
xf_gaussian_filter.hpp
The design fetches data from external memory (with the help of SDSoC data movers) and is
transferred to the function in 8-bit or 64-bit packets, based on the configured mode. Assuming
8-bits per pixel, 8 pixels can be packed into 64-bits. Therefore, 8 pixels are available to be
processed in parallel.
Enable the FILTER_SIZE_3 and the NO macros in the xf_config_params.h file. The macro
is used to set the filter size to 3x3 and #define NO 1 macro enables 1 pixel parallelism.
Note: For more information on the pragmas used for hardware accelerator functions in SDSoC, see SDSoC
Environment User Guide (UG1027).
Note: In an HLS standalone mode like Cosim, use cv::imread followed by copyTo function, instead of
xf::imread.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
PTYPE Input pixel type. Value should be in accordance with the ‘type’ argument’s value.
ROWS Maximum height of the image to be read
COLS Maximum width of the image to be read
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
filename Name of the file to be loaded
type Flag that depicts the type of image. The values are:
xf::imwrite
The function xf::imwrite saves the image to the specified file from the given xf::Mat. The image
format is chosen based on the file name extension. This function internally uses cv::imwrite for
the processing. Therefore, all the limitations of cv::imwrite are also applicable to xf::imwrite.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
PTYPE Input pixel type. Supported types are: XF_8UC1, XF_16UC1, XF_8UC4, and XF_16UC4
ROWS Maximum height of the image to be read
COLS Maximum width of the image to be read
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
img_name Name of the file with the extension
img xf::Mat array to be saved
xf::absDiff
The function xf::absDiff computes the absolute difference between each individual pixels of an
xf::Mat and a cv::Mat, and returns the difference values in a cv::Mat.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
xf::convertTo
The xf::convertTo function performs bit depth conversion on each individual pixel of the given
input image. This method converts the source pixel values to the target data type with
appropriate casting.
dst(x,y)= cast<target-data-type>(α(src(x,y)+β))
Note: The output and input Mat cannot be the same. That is, the converted image cannot be stored in the
Mat of the input image.
API Syntax
Parameter Descriptions
Parameter Description
DST_T Output pixel type. Possible values are XF_8UC1, XF_16UC1, XF_16SC1, and XF_32SC1.
ROWS Maximum height of image to be read
COLS Maximum width of image to be read
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1, XF_NPPC4, and
XF_NPPC8 for 1-pixel, 4-pixel, and 8-pixel parallel operations respectively. XF_32SC1 and
XF_NPPC8 combination is not supported.
dst Converted xf Mat
ctype Conversion type : Possible values are listed here.
//Down-convert:
• XF_CONVERT_16U_TO_8U
• XF_CONVERT_16S_TO_8U
• XF_CONVERT_32S_TO_8U
• XF_CONVERT_32S_TO_16U
• XF_CONVERT_32S_TO_16S
//Up-convert:
• XF_CONVERT_8U_TO_16U
• XF_CONVERT_8U_TO_16S
• XF_CONVERT_8U_TO_32S
• XF_CONVERT_16U_TO_32S
• XF_CONVERT_16S_TO_32S
Parameter Description
Absolute Difference Bit Depth Conversion Bilateral Filter Canny Edge Detection
Accumulate Channel Combine Box Filter FAST Corner Detection
Accumulate Squared Channel Extract Custom Convolution Harris Corner Detection
Accumulate Weighted Color Conversion Dilate Histogram Computation
Atan2 Histogram Equalization Erode Dense Pyramidal LK Optical
Flow
Bitwise AND, Bitwise NOT, Look Up Table Gaussian Filter Dense Non-Pyramidal LK
Bitwise OR, Bitwise XOR Optical Flow
Gradient Magnitude Remap Sobel Filter MinMax Location
Gradient Phase Resolution Conversion Median Blur Filter Thresholding
(Resize)
Integral Image convertScaleAbs Scharr Filter SVM
Inverse (Reciprocal) Demosaicing Otsu Threshold
Pixel-Wise Addition Crop Mean Shift Tracking
Pixel-Wise Multiplication Reduce HOG
Pixel-Wise Subtraction BoundingBox Stereo Local Block Matching
Square Root WarpTransform
Mean and Standard Pyramid Up
Deviation
AddS, Compare, CompareS, Pyramid Down
Max, MaxS, Min, MinS, Set,
SubRS, SubS, Zero
Sum Delay
Addweighted Duplicate
Color Thresholding
BGR2HSV
InitUndistortRectifyMapInver
se
HoughLines
Semi Global Method for
Stereo Disparity Estimation
Paintmask
InRange
Kalman Filter
Notes:
1. The maximum resolution supported for all the functions is 4K, except Houghlines and HOG (RB mode).
Note: Resolution Conversion (Resize) in 8 pixel per cycle mode, Dense Pyramidal LK Optical Flow, and
Dense Non-Pyramidal LK Optical Flow functions are not supported on the Zynq-7000 SoC ZC702 devices,
due to the higher resource utilization.
Note: Number of pixel per clock depends on the maximum bus width a device can support.
For example: Zynq-7000 Soc has 64 bit interface and so for a pixel type 16UC1 ,maximum of four pixel per
clock(XF_NPPC4) is possible.
Absolute Difference
The absdiff function finds the pixel wise absolute difference between two input images and
returns an output image. The input and the output images must be the XF_8UC1 type.
Where,
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and Output pixel type. Only 8-bit, unsigned, 1 and 3
channels are supported (XF_8UC1 and XF_8UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be
multiple of 8, for 8-pixel operation.
NPC Number of pixels to be processed per cycle; possible
options are XF_NPPC1 and XF_NPPC8 for 1 pixel and 8 pixel
operations respectively.
src1 Input image
src2 Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 67 17
8 pixel 150 0 0 67 234 39
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
There is no deviation from OpenCV, except that the absdiff function supports 8-bit pixels.
Accumulate
The accumulate function adds an image (src1) to the accumulator image (src2), and generates
the accumulated result image (dst).
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void accumulate (
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2,
xf::Mat<int DST_T, int ROWS, int COLS, int NPC> dst )
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
DST_T Output pixel type. Only 16-bit, unsigned, 1 and 3 channels are supported (XF_16UC1 and
XF_16UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Recommend using a multiple of 8, for an 8-
pixel operation.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8
for 1 pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48E FF LUT CLB
(MHz)
1 pixel 300 0 0 62 55 12
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48E FF LUT CLB
(MHz)
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process 4K 3 Channel image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48E FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
In OpenCV the accumulated image is stored in the second input image. The src2 image acts as
both input and output, as shown below:
src2(x, y) = src1(x, y) + src2⎛⎝x, y⎞⎠
Whereas, in the xfOpenCV implementation, the accumulated image is stored separately, as
shown below:
Accumulate Squared
The accumulateSquare function adds the square of an image (src1) to the accumulator image
(src2) and generates the accumulated result (dst).
The accumulated result is a separate argument in the function, instead of having src2 as the
accumulated result. In this implementation, having a bi-directional accumulator is not possible as
the function makes use of streams.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void accumulateSquare (
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2,
xf::Mat<int DST_T, int ROWS, int COLS, int NPC> dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and XF_8UC3)
DST_T Output pixel type. Only 16-bit, unsigned, 1 and 3 channels are supported (XF_16UC1 and
XF_16UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48E FF LUT CLB
(MHz)
1 pixel 300 0 1 71 52 14
8 pixel 150 0 8 401 247 48
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process 4K 3 Channel
image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48E FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
In OpenCV the accumulated squared image is stored in the second input image. The src2 image
acts as input as well as output.
Accumulate Weighted
The accumulateWeighted function computes the weighted sum of the input image (src1) and
the accumulator image (src2) and generates the result in dst.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void accumulateWeighted (
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2,
xf::Mat<int DST_T, int ROWS, int COLS, int NPC> dst,
float alpha )
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and XF_8UC3)
DST_T Output pixel type. Only 16-bit, unsigned, 1 and 3 channels are supported (XF_16UC1 and
XF_16UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Recommend multiples of 8, for an 8-pixel
operation.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
dst Output image
alpha Weight applied to input image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K 3
Channel image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
The resultant image in OpenCV is stored in the second input image. The src2 image acts as input
as well as output, as shown below:
AddS
The AddS function performs the addition operation between pixels of input image src and given
scalar value scl and stores the result in dst.
API Syntax
template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1>
void addS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char
_scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the AddS function in both the
resource optimized (8 pixel) mode and normal mode, as generated using Vivado HLS 2019.1
version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 100 101
LUT 52 185
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
CLB 20 45
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Addweighted
The addweighted function calculates a weighted sum of two input images src1, src2 and
generates the result in dst.
API Syntax
template< int SRC_T , int DST_T, int ROWS, int COLS, int NPC=1>
void addWeighted(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, float alpha,
xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2, float beta, float gamma,
xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixel operations respectively.
_src1 First Input image
Alpha Weight applied on first image
_src2 Second Input image
Beta Weight applied on second image
gamma Scalar added to each sum
_dst Output image
Resource Utilization
The following table summarizes the resource utilization of the Addweighted function in Resource
optimized (8 pixel) mode and normal mode, as generated in Vivado HLS 2019.1 version tool for
the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 11 25
FF 903 680
LUT 851 1077
CLB 187 229
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Bilateral Filter
In general, any smoothing filter smoothens the image which will affect the edges of the image. To
preserve the edges while smoothing, a bilateral filter can be used. In an analogous way as the
Gaussian filter, the bilateral filter also considers the neighboring pixels with weights assigned to
each of them. These weights have two components, the first of which is the same weighing used
by the Gaussian filter. The second component takes into account the difference in the intensity
between the neighboring pixels and the evaluated one.
2σ 2
The gaussian filter is given by: G σ = e
API Syntax
template<int FILTER_SIZE, int BORDER_TYPE, int TYPE, int ROWS, int COLS,
int NPC=1>
void bilateralFilter (
xf::Mat<int TYPE, int ROWS, int COLS, int NPC> src,
xf::Mat<int TYPE, int ROWS, int COLS, int NPC> dst,
float sigma_space, float sigma_color )
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
NPC Number of pixels to be processed per cycle; this function supports only XF_NPPC1
or 1 pixel per cycle operations.
src Input image
dst Output image
sigma_space Standard deviation of filter in spatial domain
sigma_color Standard deviation of filter used in color space
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to
progress a grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Frequency
Operating Mode Filter Size
BRAM_18K DSP_48Es FF LUT
(MHz)
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to
progress a 4K 3 channel image.
Utilization Estimate
Operating
Frequency
Operating Mode Filter Size
BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
as generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode Filter Size 300 MHz
Max (ms)
API Syntax
template <int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void convertTo(xf::Mat<SRC_T, ROWS, COLS, NPC> &_src_mat, xf::Mat<DST_T,
ROWS, COLS, NPC> &_dst_mat, ap_uint<4> _convert_type, int _shift)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
Possible Conversions
The following table summarizes supported conversions. The rows are possible input image bit
depths and the columns are corresponding possible output image bit depths (U=unsigned,
S=signed).
Resource Utilization
The following table summarizes the resource utilization of the convertTo function, generated
using Vivado HLS 2019.1 tool for the Xilinx® Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency
Bitwise AND
The bitwise_and function performs the bitwise AND operation for each pixel between two
input images, and returns an output image.
Where,
•
I out⎛⎝x, y⎞⎠ is the intensity of output image at (x, y) position
•
I in1⎛⎝x, y⎞⎠ is the intensity of first input image at (x, y) position
•
I in2⎛⎝x, y⎞⎠ is the intensity of second input image at (x, y) position
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and output pixel type. Supports 1 channel and 3 channels (XF_8UC1 and XF_8UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8 pixel mode)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations, respectively.
src1 Input image
src2 Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 44 10
8 pixel 150 0 0 59 72 13
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
3Channel image
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Bitwise NOT
The bitwise_not function performs the pixel wise bitwise NOT operation for the pixels in the
Where,
•
I out⎛⎝x, y⎞⎠ is the intensity of output image at (x, y) position
•
I in⎛⎝x, y⎞⎠ is the intensity of input image at (x, y) position
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and output pixel type. Supports 1 channel and 3 channels (XF_8UC1 and XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations, respectively.
src Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 97 78 20
8 pixel 150 0 0 88 97 21
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
3Channel image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Bitwise OR
The bitwise_or function performs the pixel wise bitwise OR operation between two input
Where,
•
I out(x, y) is the intensity of output image at (x, y) position
•
I in1(x, y) is the intensity of first input image at (x, y) position
•
I in2(x, y) is the intensity of second input image at (x, y) position
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and output pixel type. Supports 1 channel and 3 channels (XF_8UC1 and XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of 8, for 8 pixel mode.
Parameter Description
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
dst Output image
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 44 10
8 pixel 150 0 0 59 72 13
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
3Channel image
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Bitwise XOR
The bitwise_xor function performs the pixel wise bitwise XOR operation between two input
images, and returns an output image, as shown below:
Where,
•
I out⎛⎝x, y⎞⎠ is the intensity of output image at (x, y) position
•
I in1⎛⎝x, y⎞⎠ is the intensity of first input image at (x, y) position
•
I in2⎛⎝x, y⎞⎠ is the intensity of second input image at (x, y) position
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and output pixel type. Supports 1 channel and 3 channels (XF_8UC1 and XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of 8, for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC8 for 1 pixel and 8 pixel operations respectively.
src1 Input image
src2 Input image
Parameter Description
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image:
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 44 10
8 pixel 150 0 0 59 72 13
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4k Channel
image
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image:
Latency Estimate
Operating Mode
Max Latency (ms)
Box Filter
The boxFilter function performs box filtering on the input image. Box filter acts as a low-pass
filter and performs blurring over the image. The boxFilter function or the box blur is a spatial
domain linear filter in which each pixel in the resulting image has a value equal to the average
value of the neighboring pixels in the image.
⎡1 . . . 1⎤
K box = 1 ⎢1 . . . 1⎥
(ksize*ksize) ⎣
1 . . . 1⎦
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
FILTER_SIZE Filter size. Filter size of 3(XF_FILTER_3X3), 5(XF_FILTER_5X5) and 7(XF_FILTER_7X7) are
supported
BORDER_TYPE Border Type supported is XF_BORDER_CONSTANT
SRC_T Input and output pixel type. 8-bit, unsigned, 16-bit unsigned and 16-bit signed, 1 channel is
supported (XF_8UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8
for 1 pixel and 8 pixel operations respectively.
USE_URAM Enable to map storage structures to UltraRAM
_src_mat Input image
_dst_mat Output image
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization of the kernel in different configurations,
generated using the SDx™ 2019.1 tool for the xczu7ev-ffvc1156-2-e FPGA, to process a
grayscale 4K (3840x2160) image with UltraRAM enable.
Table 78: boxFilter Function Resource Utilization Summary with UltraRAM enabled
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K URAM DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image:
Latency Estimate
Operating Frequency
Operating Mode Filter Size
(MHz) Max (ms)
BoundingBox
The boundingbox function highlights the region of interest (ROI) from the input image using
below equations.
API Syntax
template<int SRC_T, int ROWS, int COLS, int MAX_BOXES=1, int NPC=1>
void boundingbox(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat, xf::Rect_<int>
*roi , xf::Scalar<4,unsigned char > *color, int num_box)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel Type. Only 8-bit, unsigned, 1 channel and 3 channel is supported
(XF_8UC1,XF_8UC3).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in 1-pixel mode as generated
using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to process a
grayscale 4K (2160x3840) image for highlighting 3 different boundaries(480x640, 100x200,
300x300).
Latency Estimate
Operating Mode
Max Latency (ms)
xfOpenCV Reference:
void rectangle(Mat& img, Rect rec, const Scalar& color, int thickness=1,
int lineType=8, int shift=0 )
In this algorithm, the noise in the image is reduced first by applying a Gaussian mask. The
Gaussian mask used here is the average mask of size 3x3. Thereafter, gradients along x and y
directions are computed using the Sobel gradient function. The gradients are used to compute
the magnitude and phase of the pixels. The phase is quantized and the pixels are binned
accordingly. Non-maximal suppression is applied on the pixels to remove the weaker edges.
Edge tracing is applied on the remaining pixels to draw the edges on the image. In this algorithm,
the canny up to non-maximal suppression is in one kernel and the edge linking module is in
another kernel. After non-maxima suppression, the output is represented as 2-bit per pixel,
Where:
The output is packed as 8-bit (four 2-bit pixels) in 1 pixel per cycle operation and packed as 16-
bit (eight 2-bit pixels) in 8 pixel per cycle operation. For the edge linking module, the input is 64-
bit, such 32 pixels of 2-bit are packed into a 64-bit. The edge tracing is applied on the pixels and
returns the edges in the image.
API Syntax
Parameter Descriptions
The following table describes the xf::Canny template and function parameters:
Parameter Description
The following table describes the EdgeTracing template and function parameters:
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of xf::Canny and EdgeTracing in
different configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-
es1 FPGA, to process a grayscale HD (1080x1920) image for Filter size is 3.
Resource Utilization
1 pixel 1 pixel 8 pixel 8 pixel
Name Edge Linking Edge Linking
L1NORM,FS:3 L2NORM,FS:3 L1NORM,FS:3 L2NORM,FS:3
300 MHz 300 MHz 150 MHz 150 MHz 300 MHz 150 MHz
BRAM_18K 22 18 36 32 84 84
DSP48E 2 4 16 32 3 3
FF 3027 3507 4899 6208 17600 14356
LUT 2626 3170 6518 9560 15764 14274
CLB 606 708 1264 1871 2955 3241
The following table summarizes the resource utilization of xf::Canny and EdgeTracing in
different configurations, generated using SDx 2019.1 tool for the xczu7ev-ffvc1156-2-e FPGA,
to process a grayscale 4K image for Filter size is 3.
Table 86: xf::Canny and EdgeTracing Function Resource Utilization Summary with
UltraRAM Enable
Resource Utilization
1 pixel 1 pixel 8 pixel 8 pixel
Name Edge Linking Edge Linking
L1NORM,FS:3 L2NORM,FS:3 L1NORM,FS:3 L2NORM,FS:3
300 MHz 300 MHz 150 MHz 150 MHz 300 MHz 150 MHz
BRAM_18K 10 8 3 3 4 4
URAM 1 1 15 13 8 8
DSP48E 2 4 16 32 8 8
FF 3184 3749 5006 7174 5581 7054
LUT 2511 2950 6695 9906 4092 6380
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image for L1NORM, filter size is 3 and including the edge linking
module.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
In OpenCV Canny function, the Gaussian blur is not applied as a pre-processing step.
Channel Combine
The merge function, merges single channel images into a multi-channel image. The number of
channels to be merged should be four.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void merge(xf::Mat<SRC_T, ROWS, COLS, NPC> &_src1, xf::Mat<SRC_T, ROWS,
COLS, NPC> &_src2, xf::Mat<SRC_T, ROWS, COLS, NPC> &_src3, xf::Mat<SRC_T,
ROWS, COLS, NPC> &_src4, xf::Mat<DST_T, ROWS, COLS, NPC> &_dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
DST_T Output pixel type. Only 8-bit, unsigned, 4 channel is supported (XF_8UC4)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 for 1 pixel operation.
_src1 Input single-channel image
_src2 Input single-channel image
_src3 Input single-channel image
_src4 Input single-channel image
_dst Output multi-channel image
Resource Utilization
The following table summarizes the resource utilization of the merge function, generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process 4 single-channel
HD (1080x1920) images.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process 4 single channel HD
(1080x1920) images.
Latency Estimate
Operating Mode
Max Latency
Channel Extract
The extractChannel function splits a multi-channel array (32-bit pixel-interleaved data) into
several single-channel arrays and returns a single channel. The channel to be extracted is
specified by using the channel argument.
Unknown XF_EXTRACT_CH_0
Unknown XF_EXTRACT_CH_1
Unknown XF_EXTRACT_CH_2
Unknown XF_EXTRACT_CH_3
RED XF_EXTRACT_CH_R
GREEN XF_EXTRACT_CH_G
BLUE XF_EXTRACT_CH_B
ALPHA XF_EXTRACT_CH_A
LUMA XF_EXTRACT_CH_Y
Cb/U XF_EXTRACT_CH_U
Cr/V/Value XF_EXTRACT_CH_V
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void extractChannel(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,
xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_mat, uint16_t _channel)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 4channel is supported (XF_8UC4)
DST_T Output pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be multiple of 8 for 8 pixel mode
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 for 1 pixel operation.
_src_mat Input multi-channel image
_dst_mat Output single channel image
_channel Channel to be extracted (See xf_channel_extract_e enumerated type in file xf_params.h for possible
values.)
Resource Utilization
The following table summarizes the resource utilization of the extractChannel function,
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4
channel HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a 4 channel HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Color Conversion
The color conversion functions convert one image format to another image format, for the
combinations listed in the following table. The rows represent the input formats and the columns
represent the output formats. Supported conversions are discussed in the following sections.
I/O RGBA NV12 NV21 IYUV UYVY YUYV YUV4 RGB BGR
Formats
RGBA N/A For For For For
details, details, details, details,
see the see the see the see the
RGBA to RGBA to RGBA/RG RGBA/RG
NV12 NV21 B to IYUV B to YUV4
NV12 For N/A For For For For For For For
details, details, details, details, details, details, details, details,
see the see the see the see the see the see the see the see the
NV12 to NV12 to NV12 to NV12/ NV12/ NV12 to NV12/ NV12/
RGBA NV21/ IYUV NV21 to NV21 to YUV4 NV21 to NV21 to
NV21 to UYVY/ UYVY/ RGB/ BGR RGB/ BGR
NV12 YUYV YUYV
NV21 For For N/A For For For For For For
details, details, details, details, details, details, details, details,
see the see the see the see the see the see the see the see the
NV21 to NV12 to NV21 to NV12/ NV12/ NV21 to NV12/ NV12/
RGBA NV21/ IYUV NV21 to NV21 to YUV4 NV21 to NV21 to
NV21 to UYVY/ UYVY/ RGB/ BGR RGB/ BGR
NV12 YUYV YUYV
IYUV For For N/A For For
details, details, details, details,
see the see the see the see the
IYUV to IYUV to IYUV to IYUV to
RGBA/RG NV12 YUV4 RGBA/RG
B B
UYVY For For For N/A
details, details, details,
see the see the see the
UYVY to UYVY to UYVY to
RGBA NV12 IYUV
YUYV For For For N/A
details, details, details,
see the see the see the
YUYV to YUYV to YUYV to
RGBA NV12 IYUV
YUV4 N/A
RGB For details For details For details For details For details For details For details
see see see the see see see the see
theRGB/ theRGB/ RGBA/RG theRGB/B theRGB/B RGBA/RG theBGR to
BGR to BGR to B to IYUV GR to GR to B to YUV4 RGB / RGB
NV12/ NV12/ UYVY/ UYVY/ to BGR
NV21 NV21 YUYV YUYV
BGR For details For details For details For details For details
see see see the see the see
theRGB/ theRGB/ RGB/BGR RGB/BGR theBGR to
BGR to BGR to to UYVY/ to UYVY/ RGB / RGB
NV12/ NV12/ YUYV YUYV to BGR
NV21 NV21
Other conversions
Source: http://www.fourcc.org/fccyvrgb.php
RGBA/RGB to YUV4
The rgba2yuv4 function converts a 4-channel RGBA image to YUV444 format and the
rgb2yuv4 function converts a 3-channel RGB image to YUV444 format. The function outputs Y,
U, and V streams separately.
API Syntax
template <int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void rgba2yuv4(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image, xf::Mat<DST_T, ROWS, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS, COLS, NPC> & _v_image)
template <int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void rgb2yuv4(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image, xf::Mat<DST_T, ROWS, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 4(RGBA) and 3(RGB)-channel are supported (XF_8UC4 and
XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input Y plane of size (ROWS, COLS).
_y_image Output Y image of size (ROWS, COLS).
_u_image Output U image of size (ROWS, COLS).
_v_image Output V image of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of RGBA/RGB to YUV4 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of RGBA/RGB to YUV4 for different
configurations, as generated using the Vivado HLS 2019.1 version for the Xczu9eg-ffvb1156-1-i-
es1, to process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGBA/RGB to IYUV
The rgba2iyuv function converts a 4-channel RGBA image to IYUV (4:2:0) format and the
rgb2iyuv function converts a 3-channel RGB image to IYUV (4:2:0) format. The function
outputs Y, U, and V planes separately. IYUV holds subsampled data, Y is sampled for every
RGBA/RGB pixel and U,V are sampled once for 2row and 2column(2x2) pixels. U and V planes
are of (rows/2)*(columns/2) size, by cascading the consecutive rows into a single row the planes
size becomes (rows/4)*columns.
API Syntax
template <int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void rgba2iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image, xf::Mat<DST_T, ROWS/4, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS/4, COLS, NPC> & _v_image)
template <int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void rgb2iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image, xf::Mat<DST_T, ROWS/4, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS/4, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit,unsigned, 4(RGBA) and 3(RGB)-channel are supported (XF_8UC4 and
XF_8UC3).
DST_T Output pixel type. Only 8-bit,unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input Y plane of size (ROWS, COLS).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of RGBA/RGB to IYUV for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of RGBA/RGB to IYUV for different
configurations, as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGBA to NV12
The rgba2nv12 function converts a 4-channel RGBA image to NV12 (4:2:0) format. The
function outputs Y plane and interleaved UV plane separately. NV12 holds the subsampled data,
Y is sampled for every RGBA pixel and U, V are sampled once for 2row and 2columns (2x2)
pixels. UV plane is of (rows/2)*(columns/2) size as U and V values are interleaved.
API Syntax
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1>
void rgba2nv12(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T, ROWS,
COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC> & _uv)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of RGBA to NV12 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of RGBA to NV12 for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGBA to NV21
The rgba2nv21 function converts a 4-channel RGBA image to NV21 (4:2:0) format. The
function outputs Y plane and interleaved VU plane separately. NV21 holds subsampled data, Y is
sampled for every RGBA pixel and U, V are sampled once for 2 row and 2 columns (2x2) RGBA
pixels. UV plane is of (rows/2)*(columns/2) size as V and U values are interleaved.
API Syntax
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1>
void rgba2nv21(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T, ROWS,
COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC> & _uv)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 4-channel is supported (XF_8UC4).
Y_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input RGBA image of size (ROWS, COLS).
_y Output Y image of size (ROWS, COLS).
_uv Output UV image of size (ROWS/2, COLS/2).
Resource Utilization
The following table summarizes the resource utilization of RGBA to NV21 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of RGBA to NV21 for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
YUYV to RGBA
The yuyv2rgba function converts a single-channel YUYV (YUV 4:2:2) image format to a 4-
channel RGBA image. YUYV is a sub-sampled format, a set of YUYV value gives 2 RGBA pixel
values. YUYV is represented in 16-bit values where as, RGBA is represented in 32-bit values.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void yuyv2rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 4-channel is supported (XF_8UC4).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 incase of 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_dst Output image of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of YUYV to RGBA for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of UYVY to RGBA for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
YUYV to NV12
The yuyv2nv12 function converts a single-channel YUYV (YUV 4:2:2) image format to NV12
(YUV 4:2:0) format. YUYV is a sub-sampled format, 1 set of YUYV value gives 2 Y values and 1 U
and V value each.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
Y_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Output UV image pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
Parameter Description
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC8 for 1 pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_y_image Output Y plane of size (ROWS, COLS).
_uv_image Output U plane of size (ROWS/2, COLS/2).
Resource Utilization
The following table summarizes the resource utilization of YUYV to NV12 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of YUYV to NV12 for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
YUYV to IYUV
The yuyv2iyuv function converts a single-channel YUYV (YUV 4:2:2) image format to
IYUV(4:2:0) format. Outputs of the function are separate Y, U, and V planes. YUYV is a sub-
sampled format, 1 set of YUYV value gives 2 Y values and 1 U and V value each. U, V values of
the odd rows are dropped as U, V values are sampled once for 2 rows and 2 columns in the
IYUV(4:2:0) format.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void yuyv2iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image, xf::Mat<DST_T, ROWS/4, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS/4, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned,1 channel is supported (XF_16UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel modes.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS/4, COLS).
_v_image Output V plane of size (ROWS/4, COLS).
Resource Utilization
The following table summarizes the resource utilization of YUYV to IYUV for different
configurations, generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of YUYV to IYUV for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
UYVY to IYUV
The uyvy2iyuv function converts a UYVY (YUV 4:2:2) single-channel image to the IYUV
format. The outputs of the functions are separate Y, U, and V planes. UYVY is sub sampled
format. One set of UYVY value gives two Y values and one U and V value each.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void uyvy2iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _y_image,xf::Mat<DST_T, ROWS/4, COLS, NPC> & _u_image,
xf::Mat<DST_T, ROWS/4, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
Parameter Description
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS/4, COLS).
_v_image Output V plane of size (ROWS/4, COLS).
Resource Utilization
The following table summarizes the resource utilization of UYVY to IYUV for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of UYVY to IYUV for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
UYVY to RGBA
The uyvy2rgba function converts a UYVY (YUV 4:2:2) single-channel image to a 4-channel
RGBA image. UYVY is sub sampled format, 1set of UYVY value gives 2 RGBA pixel values. UYVY
is represented in 16-bit values where as RGBA is represented in 32-bit values.
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void uyvy2rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_dst Output image of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of UYVY to RGBA for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of UYVY to RGBA for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
UYVY to NV12
The uyvy2nv12 function converts a UYVY (YUV 4:2:2) single-channel image to NV12 format.
The outputs are separate Y and UV planes. UYVY is sub sampled format, 1 set of UYVY value
gives 2 Y values and 1 U and V value each.
API Syntax
template<int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1, int
NPC_UV=1>
void uyvy2nv12(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src,xf::Mat<Y_T, ROWS,
COLS, NPC> & _y_image,xf::Mat<UV_T, ROWS/2, COLS/2, NPC_UV> & _uv_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
Y_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Output UV image pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC4 for 1 pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS).
_y_image Output Y plane of size (ROWS, COLS).
_uv_image Output U plane of size (ROWS/2, COLS/2).
Resource Utilization
The following table summarizes the resource utilization of UYVY to NV12 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of UYVY to NV12 for different configurations,
as generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
IYUV to RGBA/RGB
The iyuv2rgba function converts single channel IYUV (YUV 4:2:0) image to a 4-channel RGBA
image and iyuv2rgb function converts single channel IYUV (YUV 4:2:0) image to a 3-channel
RGB image . The inputs to the function are separate Y, U, and V planes. IYUV is sub sampled
format, U and V values are sampled once for 2 rows and 2 columns of the RGBA/RGB pixels. The
data of the consecutive rows of size (columns/2) is combined to form a single row of size
(columns).
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void iyuv2rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<SRC_T,
ROWS/4, COLS, NPC> & src_u,xf::Mat<SRC_T, ROWS/4, COLS, NPC> & src_v,
xf::Mat<DST_T, ROWS, COLS, NPC> & _dst0)
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void iyuv2rgb(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<SRC_T,
ROWS/4, COLS, NPC> & src_u,xf::Mat<SRC_T, ROWS/4, COLS, NPC> & src_v,
xf::Mat<DST_T, ROWS, COLS, NPC> & _dst0)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 4(RGBA) and 3(RGB)-channel are supported (XF_8UC4
and XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_u Input U plane of size (ROWS/4, COLS).
src_v Input V plane of size (ROWS/4, COLS).
_dst0 Output RGBA image of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of IYUV to RGBA/RGB for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of IYUV to RGBA/RGB for different
configurations, as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-
ffvb1156-1-i-es1, to process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
IYUV to NV12
The iyuv2nv12 function converts single channel IYUV image to NV12 format. The inputs are
separate U and V planes. There is no need of processing Y plane as both the formats have a same
Y plane. U and V values are rearranged from plane interleaved to pixel interleaved.
API Syntax
template<int SRC_T, int UV_T, int ROWS, int COLS, int NPC =1, int NPC_UV=1>
void iyuv2nv12(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<SRC_T,
ROWS/4, COLS, NPC> & src_u,xf::Mat<SRC_T, ROWS/4, COLS, NPC> &
src_v,xf::Mat<SRC_T, ROWS, COLS, NPC> & _y_image, xf::Mat<UV_T, ROWS/2,
COLS/2, NPC_UV> & _uv_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Output pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8 for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV Pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC4
for 1 pixel and 4-pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_u Input U plane of size (ROWS/4, COLS).
src_v Input V plane of size (ROWS/4, COLS).
_y_image Output V plane of size (ROWS, COLS).
_uv_image Output UV plane of size (ROWS/2, COLS/2).
Resource Utilization
The following table summarizes the resource utilization of IYUV to NV12 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image..
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of IYUV to NV12 for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
IYUV to YUV4
The iyuv2yuv4 function converts a single channel IYUV image to a YUV444 format. Y plane is
same for both the formats. The inputs are separate U and V planes of IYUV image and the
outputs are separate U and V planes of YUV4 image. IYUV stores subsampled U,V values. YUV
format stores U and V values for every pixel. The same U, V values are duplicated for 2 rows and
2 columns (2x2) pixels in order to get the required data in the YUV444 format.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of IYUV to YUV4 for different
configurations, generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of IYUV to YUV4 for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV12 to IYUV
The nv122iyuv function converts NV12 format to IYUV format. The function inputs the
interleaved UV plane and the outputs are separate U and V planes. There is no need of
processing the Y plane as both the formats have a same Y plane. U and V values are rearranged
from pixel interleaved to plane interleaved.
API Syntax
template<int SRC_T, int UV_T, int ROWS, int COLS, int NPC=1, int NPC_UV=1>
void nv122iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<UV_T,
ROWS/2, COLS/2, NPC_UV> & src_uv,xf::Mat<SRC_T, ROWS, COLS, NPC> &
_y_image,xf::Mat<SRC_T, ROWS/4, COLS, NPC> & _u_image,xf::Mat<SRC_T,
ROWS/4, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8 pixel mode).
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC4 for 1 pixel and 4-pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_uv Input UV plane of size (ROWS/2, COLS/2).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS/4, COLS).
_v_image Output V plane of size (ROWS/4, COLS).
Resource Utilization
The following table summarizes the resource utilization of NV12 to IYUV for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV12 to IYUV for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV12 to RGBA
The nv122rgba function converts NV12 image format to a 4-channel RGBA image. The inputs
to the function are separate Y and UV planes. NV12 holds sub sampled data, Y plane is sampled
at unit rate and 1 U and 1 V value each for every 2x2 Y values. To generate the RGBA data, each
U and V value is duplicated (2x2) times.
API Syntax
template<int SRC_T, int UV_T, int DST_T, int ROWS, int COLS, int NPC=1>
void nv122rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y,xf::Mat<UV_T,
ROWS/2, COLS/2, NPC> & src_uv,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst0)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
DST_T Output pixel type. Only 8-bit,unsigned,4channel is supported (XF_8UC4).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of NV12 to RGBA for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV12 to RGBA for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV12 to YUV4
The nv122yuv4 function converts a NV12 image format to a YUV444 format. The function
outputs separate U and V planes. Y plane is same for both the image formats. The UV planes are
duplicated 2x2 times to represent one U plane and V plane of the YUV444 image format.
API Syntax
template<int SRC_T,int UV_T, int ROWS, int COLS, int NPC=1, int NPC_UV=1>
void nv122yuv4(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<UV_T,
ROWS/2, COLS/2, NPC_UV> & src_uv,xf::Mat<SRC_T, ROWS, COLS, NPC> &
_y_image, xf::Mat<SRC_T, ROWS, COLS, NPC> & _u_image,xf::Mat<SRC_T, ROWS,
COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8 pixel mode).
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC4 for 1 pixel and 4-pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_uv Input UV plane of size (ROWS/2, COLS/2).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS, COLS).
_v_image Output V plane of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of NV12 to YUV4 for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV12 to YUV4 for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV21 to IYUV
The nv212iyuv function converts a NV21 image format to an IYUV image format. The input to
the function is the interleaved VU plane only and the outputs are separate U and V planes. There
is no need of processing Y plane as both the formats have same the Y plane. U and V values are
rearranged from pixel interleaved to plane interleaved.
API Syntax
template<int SRC_T, int UV_T, int ROWS, int COLS, int NPC=1,int NPC_UV=1>
void nv212iyuv(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<UV_T,
ROWS/2, COLS/2, NPC_UV> & src_uv,xf::Mat<SRC_T, ROWS, COLS, NPC> &
_y_image, xf::Mat<SRC_T, ROWS/4, COLS, NPC> & _u_image,xf::Mat<SRC_T,
ROWS/4, COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image .
COLS Maximum width of input and output image. Must be a multiple of 8, for 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC4 for 1 pixel and 4-pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_uv Input UV plane of size (ROWS/2, COLS/2).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS/4, COLS).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of NV21 to IYUV for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV21 to IYUV for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV21 to RGBA
The nv212rgba function converts a NV21 image format to a 4-channel RGBA image. The inputs
to the function are separate Y and VU planes. NV21 holds sub sampled data, Y plane is sampled
at unit rate and one U and one V value each for every 2x2 Yvalues. To generate the RGBA data,
each U and V value is duplicated (2x2) times.
API Syntax
template<int SRC_T, int UV_T, int DST_T, int ROWS, int COLS, int NPC=1>
void nv212rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<UV_T,
ROWS/2, COLS/2, NPC> & src_uv,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst0)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
DST_T Output pixel type. Only 8-bit, unsigned, 4-channel is supported (XF_8UC4).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be a multiple of 8, incase of 8 pixel mode.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_uv Input UV plane of size (ROWS/2, COLS/2).
_dst0 Output RGBA image of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of NV21 to RGBA for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV12 to RGBA for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV21 to YUV4
The nv212yuv4 function converts an image in the NV21 format to a YUV444 format. The
function outputs separate U and V planes. Y plane is same for both formats. The UV planes are
duplicated 2x2 times to represent one U plane and V plane of YUV444 format.
API Syntax
template<int SRC_T, int UV_T, int ROWS, int COLS, int NPC=1,int NPC_UV=1>
void nv212yuv4(xf::Mat<SRC_T, ROWS, COLS, NPC> & src_y, xf::Mat<UV_T,
ROWS/2, COLS/2, NPC_UV> & src_uv, xf::Mat<SRC_T, ROWS, COLS, NPC> &
_y_image, xf::Mat<SRC_T, ROWS, COLS, NPC> & _u_image, xf::Mat<SRC_T, ROWS,
COLS, NPC> & _v_image)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Input pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8 pixel mode).
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
NPC_UV Number of UV image Pixels to be processed per cycle; possible options are XF_NPPC1 and
XF_NPPC4 for 1 pixel and 4-pixel operations respectively.
src_y Input Y plane of size (ROWS, COLS).
src_uv Input UV plane of size (ROWS/2, COLS/2).
_y_image Output Y plane of size (ROWS, COLS).
_u_image Output U plane of size (ROWS, COLS).
_v_image Output V plane of size (ROWS, COLS).
Resource Utilization
The following table summarizes the resource utilization of NV21 to YUV4 for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of NV21 to YUV4 for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGB to GRAY
The rgb2gray function converts a 3-channel RGB image to GRAY format.
Y= 0.299*R+0.587*G+0.114*B
Where,
• Y = Gray pixel
• R= Red channel
• G= Green channel
• B= Blue channel
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void rgb2gray(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image.
NPC Number of pixels to be processed per cycle.
_src RGB input image
_dst GRAY output image
Resource Utilization
The following table summarizes the resource utilization of RGB to GRAY for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of RGB to GRAY for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
BGR to GRAY
The bgr2gray function converts a 3-channel BGR image to GRAY format.
Y= 0.299*R+0.587*G+0.114*B
Where,
• Y = Gray pixel
• R= Red channel
• G= Green channel
• B= Blue channel
API Syntax
template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1>
void bgr2gray(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned,1-channel is supported (XF_8UC1).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src BGR input image
_dst GRAY output image
Resource Utilization
The following table summarizes the resource utilization of BGR to GRAY for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of BGR to GRAY for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
GRAY to RGB
The gray2rgb function converts a gray intensity image to RGB color format.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src GRAY input image.
_dst RGB output image.
Resource Utilization
The following table summarizes the resource utilization of gray2rgb for different configurations,
as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of gray2rgb for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
GRAY to BGR
The gray2bgr function converts a gray intensity image to RGB color format.
• Y = Gray pixel
• R= Red channel
• G= Green channel
• B= Blue channel
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle;
_src GRAY input image.
_dst BGR output image.
Resource Utilization
The following table summarizes the resource utilization of gray2bgr for different configurations,
as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of gray2bgr for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
HLS to RGB/BGR
The hls2(rgb/bgr) function converts HLS color space to 3-channel RGB/BGR image.
C = (1 - |2L - 1|)X S HSL
H' = H °
60
X = C X (1 - |H ' mod2 - 1 |)
m=L-C
2
(R, G, B) = ⎛⎝R 1 + m, G 1 + m , B 1 + m⎞⎠
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src HLS input image.
_dst RGB/BGR output image.
Resource Utilization
The following table summarizes the resource utilization of HLS2RGB/BGRR for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of HLS2RGB/BGR for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGB to XYZ
The rgb2xyz function converts a 3-channel RGB image to XYZ color space.
⎡X ⎤ ⎡0.412453 0.357580 0.180423⎤ ⎡R ⎤
⎢Y ⎥ = ⎢0.212671 0.715160 0.072169⎥ . ⎢G⎥
⎣Z ⎦ ⎣0.019334 0.119193 0.950227⎦ ⎣B ⎦
• R= Red channel
• G= Green channel
• B= Blue channel
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
Parameter Description
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported. (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src RGB input image.
_dst XYZ output image.
Resource Utilization
The following table summarizes the resource utilization of RGB to XYZ for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of RGB to XYZ for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
BGR to XYZ
The bgr2xyz function converts a 3-channel BGR image to XYZ color space.
⎡X ⎤ ⎡0.412453 0.357580 0.180423⎤ ⎡B ⎤
⎢Y ⎥ = ⎢0.212671 0.715160 0.072169⎥ . ⎢G⎥
⎣Z ⎦ ⎣0.019334 0.119193 0.950227⎦ ⎣R ⎦
• R= Red channel
• G= Green channel
• B= Blue channel
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be a multiple of 8.
COLS Maximum width of input and output image. Must be a multiple of 8.
NPC Number of pixels to be processed per cycle.
_src BGR input image.
_dst XYZ output image.
Resource Utilization
The following table summarizes the resource utilization of BGR to XYZ for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of BGR to XYZ for different configurations, as
generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGB/BGR to YCrCb
The (rgb/bgr)2ycrcb function converts a 3-channel RGB image to YCrCb color space.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3)
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3)
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle
_src RGB/BGR input image
_dst YCrCb output image
Resource Utilization
The following table summarizes the resource utilization of RGB/BGR2YCrCb for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Performance Estimate
Latency Estimate
Operating Mode
Max Latency (ms)
RGB/BGR to HSV
The (rgb/bgr)2hsv function converts a 3-channel RGB image to HSV color space.
V = max (R, G, B)
⎧V - min(R, G, B)
S=⎨
if V ≠ 0
V
⎩ 0 otherwise
⎧60 G - B / V - min(R, G, B)
⎛
⎝
⎞
⎠
⎛
⎝
⎞
⎠ if V = R
H = ⎨ 120 + 60 B - R / V - min(R, G, B)
⎛ ⎞ ⎛ ⎞
if V = G
⎩ 240 + 60 R - G / V - min(R, G, B)
⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
⎝ ⎠ ⎝ ⎠ if V = B
⎧ 128 f or 8 - bit images
delta = ⎨ 32768 f or 16 - bit images
⎩ 0.5 f or f loating point images
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle
_src RGB/BGR input image
_dst HSV output image
Resource Utilization
The following table summarizes the resource utilization of RGB/BGR2HSV for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of RGB/BGR2HSV for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGB/BGR to HLS
The (rgb/bgr)2hls function converts a 3-channel RGB image to HLS color space.
Vmax = max(R, G, B)
Vmin = min(R, G, B)
L = Vmax + Vmin
2
⎧ Vmax - Vmin i f l < 0.5
⎪ Vmax + Vmin
S= ⎨ Vmax - Vmin
⎪ i f L ≥ 0.5
⎩ 2 - (Vmax + Vmin ? )
⎧ 60(G - B)
⎪ S
i f Vmax = R
⎪
H = ⎨ 120 + 60(B - R) i f Vmax = G
⎪ S
⎪ 60(R - G)
⎩ 240 + S
i f Vmax = B
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src RGB/BGR input image.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of RGB/BGR2HLS for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of RGB/BGR2HLS for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
YCrCb to RGB/BGR
The ycrcb2(rgb/bgr) function converts YCrCb color space to 3-channel RGB/BGR image.
Where,
• R= Y+1.403*(Cr-delta)
• G= Y-0.714*(Cr-delta)-0.344*(cb-delta)
• B= Y+1.773+(Cb-delta)
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be a multiple of 8.
COLS Maximum width of input and output image. Must be a multiple of 8.
NPC Number of pixels to be processed per cycle.
_src YCrCb input image.
_dst RGB/BGR output image.
Resource Utilization
The following table summarizes the resource utilization of YCrCb2RGB/BGR for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Performance Estimate
Latency Estimate
Operating Mode
Max Latency (ms)
HSV to RGB/BGR
The hsv2(rgb/bgr) function converts HSV color space to 3-channel RGB/BGR image.
C = V X S HSV
H' = H °
60
X = C X (1 - |H ' mod2 - 1 |)
⎧(0,0, 0) i f H is unde f ined
⎪
⎪ (C, X, 0) if 0 ≤ H' ≤ 1
⎪ (X, C, 0) if 1 ≤ H' ≤ 2
⎪
⎝R 1, G 1, B 1⎠ = ⎨ (0, C, X)
⎛ ⎞
if 2 ≤ H' ≤ 3
⎪
⎪ (0, X, C) i f 3 ≤ H' ≤ 4
⎪ (X, 0, C) if 4 ≤ H' ≤ 5
⎪
⎩ (C, 0, X) if 5 ≤ H' ≤ 6
m=V -C
(R, G, B) = ⎛⎝R 1 + m, G 1 + m , B 1 + m⎞⎠
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3)
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3)
ROWS Maximum height of input and output image. Must be multiple of 8.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of HSV2RGB/BGRR for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of HSV2RGB/BGR for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
API Syntax
NV122RGB:
NV122BGR:
NV212RGB:
NV212BGR:
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of NV12/NV21 to RGB/ BGR function in
Normal mode (1 pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2018.3 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
API Syntax
NV122NV21:
NV212NV12:
template<int SRC_Y, int SRC_UV, int ROWS, int COLS, int NPC=1,int
NPC_UV=1>void nv212nv12(xf::Mat<SRC_Y, ROWS, COLS, NPC> & _y,
xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & _uv, xf::Mat<SRC_Y, ROWS, COLS,
NPC> & out_y, xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & out_uv)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_Y Input Y pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1)
SRC_UV Input UV pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2)
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be multiple of N.
NPC_Y Number of Y pixels to be processed per cycle. Possible options are
XF_NPPC1,XF_NPPC2,XF_NPPC4 and XF_NPPC8.
NPC_UV Number of UV Pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2 and
XF_NPPC4.
_y Y input image
_uv UV input image
out_y Y output image
out_uv UV output image
Resource Utilization
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
NV12/NV21 to UYVY/YUYV
The NV12/NV21 to UYVY/YUYV function converts a NV12/NV21 (YUV4:2:0) image to a
single-channel YUYV/UYVY (YUV 4:2:2) image format. YUYV is a sub-sampled format. YUYV/
UYVY is represented in 16-bit values whereas, RGB is represented in 24-bit values.
API Syntax
NV122UYVY:
template<int SRC_Y, int SRC_UV, int DST_T, int ROWS, int COLS, int
NPC=1,int NPC_UV=1>void nv122uyvy(xf::Mat<SRC_Y, ROWS, COLS, NPC> &
_y,xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & _uv,xf::Mat<DST_T, ROWS, COLS,
NPC> & _dst)
NV122YUYV:
template<int SRC_Y, int SRC_UV, int DST_T, int ROWS, int COLS, int
NPC=1,int NPC_UV=1>void nv122yuyv(xf::Mat<SRC_Y, ROWS, COLS, NPC> & _y,
xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & _uv, xf::Mat<DST_T, ROWS, COLS,
NPC> & _dst)
NV212UYVY:
template<int SRC_Y, int SRC_UV, int DST_T, int ROWS, int COLS, int
NPC=1,int NPC_UV=1>void nv212uyvy(xf::Mat<SRC_Y, ROWS, COLS, NPC> & _y,
xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & _uv,xf::Mat<DST_T, ROWS, COLS,
NPC> & _dst)
NV212YUYV:
template<int SRC_Y, int SRC_UV, int DST_T,int ROWS, int COLS, int
NPC=1,int NPC_UV=1>void nv212yuyv(xf::Mat<SRC_Y, ROWS, COLS, NPC> & _y,
xf::Mat<SRC_UV, ROWS/2, COLS/2, NPC_UV> & _uv, xf::Mat<DST_T, ROWS, COLS,
NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_Y Input Y image pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
SRC_UV Input UV image pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
DST_T Output pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of NPC.
NPC Number of pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2,XF_NPPC4
and XF_NPPC8.
NPC_UV Number of pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2 and
XF_NPPC4.
_y Y input image
_uv UV input image
_dst UYVY/YUYV output image
Resource Utilization
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
UYVY/YUYV to RGB/BGR
API Syntax
YUYV2RGB:
YUYV2BGR:
UYVY2RGB
UYVY2BGR:
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of UYVY/YUYV to RGB/BGR function in
Normal mode(1-Pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
API Syntax
UYVY2YUYV :
YUYV2UYVY:
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input Y pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1).
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be a multiple of N.
NPC Number of pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2,XF_NPPC4
and XF_NPPC8.
yuyv Input image
uyvy Output image
Resource Utilization
The following table summarizes the resource utilization of UYVY to YUYV/ YUYV to UYVY
function in Normal mode (1 pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx
xczu9eg-ffvb1156-2-i-es2 FPGA.
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
UYVY/YUYV to NV21
The UYVY/YUYV2NV21 function converts a single-channel YUYV/UYVY (YUV 4:2:2) image
format to NV21 (YUV 4:2:0) format. YUYV/UYVY is a sub-sampled format, 1 set of YUYV/UYVY
value gives 2 Y values and 1 U and V value each.
API Syntax
UYVY2NV21:
YUYV2NV21:
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
UV_T Output UV image pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of NPC.
NPC Number of pixels to be processed per cycle; Possible options are XF_NPPC1,XF_NPPC2,XF_NPPC4
and XF_NPPC8.
NPC_UV Number of U, V Pixels to be processed per cycle; Possible options are XF_NPPC1,XF_NPPC2 and
XF_NPPC4.
_src Input image
_y_image Y Output image
_uv_image UV Output image
Resource Utilization
The following table summarizes the resource utilization of UYVY/YUYV to NV21 function in
Normal mode (1 pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
API Syntax
RGB2NV12
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1,int
NPC_UV=1>void rgb2nv12(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T,
ROWS, COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC_UV> & _uv)
BGR2NV12
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1,int
NPC_UV=1>void bgr2nv12(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T,
ROWS, COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC_UV> & _uv)
RGB2NV21
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1,int
NPC_UV=1>void rgb2nv21(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T,
ROWS, COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC_UV> & _uv)
BGR2NV21
template <int SRC_T, int Y_T, int UV_T, int ROWS, int COLS, int NPC=1,int
NPC_UV=1>void bgr2nv21(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<Y_T,
ROWS, COLS, NPC> & _y, xf::Mat<UV_T, ROWS/2, COLS/2, NPC_UV> & _uv)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
Y_T Output pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
UV_T Output pixel type. Only 8-bit, unsigned, 2-channel is supported (XF_8UC2).
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be a multiple of NPC for N pixel mode.
NPC Number of Pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2,XF_NPPC4
and XF_NPPC8.
NPC_UV Number of Pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2 and
XF_NPPC4
_src RGB input image of size(ROWS,COLS)
_y Output Y image of size (ROWS, COLS).
_uv Output UV image of size (ROWS/2, COLS/2).
Resource Utilization
The following table summarizes the resource utilization of RGB/BGR to NV12/NV21 function in
Normal mode (1-Pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA to process a HD (1080x1920) image.
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of RGB to BGR/ BGR to RGB function in
Normal mode (1-Pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
RGB/BGR to UYVY/YUYV
The RGB/BGR to UYVY/YUYV function converts a 3- channel RGB/BGR image to a single-
channel YUYV/UYVY (YUV 4:2:2) image format. YUYV is a sub-sampled format, 2 RGBA pixel
gives set of YUYV/UYVY values. YUYV/UYVY is represented in 16-bit values whereas, RGB is
represented in 24-bit values
API Syntax
RGB to UYVY:
RGB to YUYV:
BGR to UYVY:
BGR to YUYV
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3)
DST_T Output pixel type. Only 16-bit, unsigned, 1-channel is supported (XF_16UC1)
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be multiple of NPC.
NPC Number of pixels to be processed per cycle. Possible options are XF_NPPC1,XF_NPPC2,XF_NPPC4
and XF_NPPC8..
_src RGB/BGR input image
_dst UYVY/YUYV output image
Resource Utilization
The following table summarizes the resource utilization of RGB/BGR to UYVY/YUYV function in
normal mode(1-Pixel), as generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-
ffvb1156-2-i-es2 FPGA.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in single pixel configuration as
generated using Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
XYZ to RGB/BGR
The xyz2rgb function converts XYZ color space to 3-channel RGB image.
⎡R ⎤ ⎡ 3.240479 -1.53715 -0.498535⎤ ⎡X ⎤
⎢G⎥ = ⎢-0.969256 1.875991 0.041556 ⎥ . ⎢Y ⎥
⎣B ⎦ ⎣ 0.055648 -0.204043 1.057311 ⎦ ⎣Z ⎦
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 3-channel is supported (XF_8UC3).
ROWS Maximum height of input and output image. Must be multiple of 8.
COLS Maximum width of input and output image. Must be multiple of 8.
NPC Number of pixels to be processed per cycle.
_src XYZ input image.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of XYZ2RGB/BGR for different
configurations, as generated in the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-
i-es1 FPGA, to process a HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of XYZ2RGB/BGR for different configurations,
as generated using the Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1, to
process a HD (1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Color Thresholding
The colorthresholding function compares the color space values of the source image with
low and high threshold values, and returns either 255 or 0 as the output.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 3 channel is supported (XF_8UC3).
DST_T Output pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1).
MAXCOLORS Maximum number of color values
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. Must be a multiple of 8, for 8 pixel mode.
NPC Number of pixels to be processed per cycle. Only XF_NPPC1 supported.
_src_mat Input image
_dst_mat Thresholded image
low_thresh Lowest threshold values for the colors
high_thresh Highest threshold values for the colors
Compare
The Compare function performs the per element comparison of pixels in two corresponding
images src1, src2 and stores the result in dst.
If the comparison result is true, then the corresponding element of dst is set to 255; else it is set
to 0.
API Syntax
template<int CMP_OP, int SRC_T , int ROWS, int COLS, int NPC=1>
void compare(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, xf::Mat<SRC_T, ROWS,
COLS, NPC> & _src2, xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
CMP_OP The flag that specify the relation between the elements needs to be checked
SRC_T Input Pixel Type. 8-bit, unsigned, 1 channel is supported (XF_8UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. In case of N-pixel parallelism, width should be
multiple of N
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixel operations respectively.
_src1 First input image
_src2 Second input image
_dst Output image
Resource Utilization
The following table summarizes the resource utilization of the Compare XF_CMP_NE
configuration in Resource optimized (8 pixels) mode and normal mode as generated using Vivado
HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 87 60
LUT 38 84
CLB 16 20
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (in ms)
CompareS
The CompareS function performs the comparison of a pixel in the input image (src1) and the
given scalar value scl, and stores the result in dst.
If the comparison result is true, then the corresponding element of dst is set to 255, else it is set
to 0.
API Syntax
template<int CMP_OP, int SRC_T , int ROWS, int COLS, int NPC=1>
void compareS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char
_scl[XF_CHANNELS(SRC_T,NPC)], xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
CMP_OP The flag that specifying the relation between the elements to be checked
SRC_T Input pixel type. 8-bit, unsigned, 1 channel is supported (XF_8UC1).
ROWS Maximum height of input and output image
COLS Maximum width of input and output image. In case of N-pixel parallelism, the width should
be a multiple of N
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixels operations respectively.
_src1 First input image
_scl Input scalar value, the size should be number of channels
_dst Output image
Resource Utilization
The following table summarizes the resource utilization of the CompareS function with
XF_CMP_NE configuration in Resource optimized (8 pixels) mode and normal mode as generated
using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 93 93
LUT 39 68
CLB 21 28
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Crop
The Crop function extracts the region of interest (ROI) from the input image.
ROI_width
(X,Y)
ROI ROI_height
(X’,Y’)
Input image
X22036-112718
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of 8 for 8-pixel operation.
ARCH_TYPE Architecture type. 0 resolves to stream implementation and 1 resolves to memory mapped
implementation.
NPC Number of pixels to be processed per cycle. NPC should be power of 2.
_src_mat Input image
_dst_mat Output ROI image
roi ROI is a xf::Rect object that consists of the top left corner of the rectangle along with the
height and width of the rectangle.
Resource Utilization
The following table summarizes the resource utilization of crop function in normal mode (NPC=1)
for 3 ROIs (480x640, 100x200, 300x300) as generated in the Vivado HLS 2019.1 tool for the
Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA.
Resource Utilization
Name 1-pixel per clock operation 8-pixel per clock operation
300 MHz 300MHz
BRAM_18K 6 8
DSP48E 10 10
FF 17482 16995
LUT 16831 15305
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image for 3 ROIs (480x640, 100x200, 300x300).
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Custom Convolution
The filter2D function performs convolution over an image using a user-defined kernel.
The filter can be unity gain filter or a non-unity gain filter. The filter must be of type XF_16SP. If
the co-efficients are floating point, it must be converted into the Qm.n and provided as the input
as well as the shift parameter has to be set with the ‘n’ value. Else, if the input is not of floating
point, the filter is provided directly and the shift parameter is set to zero.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a 4K 3 Channel image.
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Frequency
Operating Mode Filter Size
(MHz) Max (ms)
Delay
In image processing pipelines, it is possible that the inputs to a function with FIFO interfaces are
not synchronized. That is, the first data packet for first input might arrive a finite number of clock
cycles after the first data packet of the second input. If the function has FIFOs at its interface
with insufficient depth, this causes the whole design to stall on hardware. To synchronize the
inputs, we provide this function to delay the input packet that arrives early, by a finite number of
clock cycles.
API Syntax
template<int MAXDELAY, int SRC_T, int ROWS, int COLS,int NPC=1 >
void delayMat(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src,
xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
Demosaicing
The Demosaicing function converts a single plane Bayer pattern output, from the digital camera
sensors to a color image. This function implements an improved bi-linear interpolation technique
proposed by Malvar, He, and Cutler.
The above figure shows the Bayer mosaic for color image capture in single-CCD digital cameras.
API Syntax
template<int BFORMAT, int SRC_T, int DST_T, int ROWS, int COLS, int
NPC,bool USE_URAM=false>
void demosaicing(xf::Mat<SRC_T, ROWS, COLS, NPC> &src_mat, xf::Mat<DST_T,
ROWS, COLS, NPC> &dst_mat)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
BFORMAT Input Bayer pattern. XF_BAYER_BG, XF_BAYER_GB, XF_BAYER_GR, and XF_BAYER_RG are the
supported values.
SRC_T Input pixel type. 8-bit, unsigned,1 and 3 channel (XF_8UC1 and XF_8UC3) and 16-bit,
unsigned, 1 and 3 channel (XF_16UC1 and XF_16UC3) are supported.
DST_T Output pixel type. 8-bit, unsigned, 4 channel (XF_8UC4) and 16-bit, unsigned, 4 channel
(XF_16UC4) are supported.
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed. Must be multiple of 8, in case of 8 pixel
mode.
NPC Number of pixels to be processed per cycle; single pixel parallelism (XF_NPPC1), two-pixel
parallelism (XF_NPPC2) and four-pixel parallelism (XF_NPPC4) are supported. XF_NPPC4 is not
supported with XF_16UC1 pixel type.
USE_URAM Enable to map storage structures to UltraRAM.
_src_mat Input image
_dst_mat Output image
Resource Utilization
The following table below shows the resource utilization of the Demosaicing function, generated
using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
The following table shows the resource utilization of the Demosaicing function, generated using
SDx 2019.1 version tool for the xczu7ev-ffvc1156-2-e FPGA.
Performance Estimate
The following table shows the performance in different configurations, generated using Vivado
HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 to process a 4K (3840x2160) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Dilate
During a dilation operation, the current pixel intensity is replaced by the maximum value of the
intensity in a nxn neighborhood of the current pixel.
dst(x, y) = max
'
src⎛⎝x ', y '⎞⎠
x-1≤ x ≤ x+1
y - 1 ≤ y' ≤ y + 1
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Dilation function with rectangle
shape structuring element in 1 pixel operation and 8 pixel operation, generated using Vivado HLS
2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA for HD (1080X1920) image.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 3 6
DSP48E 0 0
FF 411 657
LUT 392 1249
CLB 96 255
The following table summarizes the resource utilization of the Dilation function with rectangle
shape structuring element in 1 pixel operation, generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA for 4K 3channel image.
Resource Utilization
Name 1 pixel per clock operation
300 MHz
BRAM_18K 18
DSP48E 0
FF 983
LUT 745
CLB 186
Performance Estimate
The following table summarizes a performance estimate of the Dilation function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Mode
Min (ms) Max (ms)
Duplicate
When various functions in a pipeline are implemented by a programmable logic, FIFOs are
instantiated between two functions for dataflow processing. When the output from one function
is consumed by two functions in a pipeline, the FIFOs need to be duplicated. This function
facilitates the duplication process of the FIFOs.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
Erode
The erode function finds the minimum pixel intensity in the NXN neighborhood of a pixel and
replaces the pixel intensity with the minimum value.
dst(x, y) = min
'
src⎛⎝x ', y '⎞⎠
x-1≤ x ≤ x+1
y - 1 ≤ y' ≤ y + 1
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Erosion function with rectangular
shape structuring element generated using Vivado HLS 2019.1 version tool for the Xczu9eg-
ffvb1156-1-i-es1 FPGA,for FullHD image(1080x1920).
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 3 6
DSP48E 0 0
FF 411 657
LUT 392 1249
CLB 96 255
The following table summarizes the resource utilization of the Erosion function with rectangular
shape structuring element generated using Vivado HLS 2019.1 version tool for the Xczu9eg-
ffvb1156-1-i-es1 FPGA,for 4K image with 3channels.
Resource Utilization
Name 1 pixel per clock operation
300 MHz
BRAM_18K 18
DSP48E 0
FF 983
LUT 3745
CLB 186
Performance Estimate
The following table summarizes a performance estimate of the Erosion function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Mode
Min (ms) Max (ms)
The fast function picks up a pixel in the image and compares the intensity of 16 pixels in its
neighborhood on a circle, called the Bresenham's circle. If the intensity of 9 contiguous pixels is
found to be either more than or less than that of the candidate pixel by a given threshold, then
the pixel is declared as a corner. Once the corners are detected, the non-maximal suppression is
applied to remove the weaker corners.
This function can be used for both still images and videos. The corners are marked in the image.
If the corner is found in a particular location, that location is marked with 255, otherwise it is
zero.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
NMS If NMS == 1, non-maximum suppression is applied to detected corners (keypoints). The value
should be 0 or 1.
SRC_T Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1)
ROWS Maximum height of input image.
COLS Maximum width of input image (must be a multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src_mat Input image
_dst_mat Output image. The corners are marked in the image.
_threshold Threshold on the intensity difference between the center pixel and its neighbors. Usually it is taken
around 20.
Resource Utilization
The following table summarizes the resource utilization of the kernel for different configurations,
generated using Vivado HLS 2019.1 for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image with NMS.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 10 20
DSP48E 0 0
FF 2695 7310
LUT 3792 20956
CLB 769 3519
Performance Estimate
The following table summarizes the performance of kernel for different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image with non-maximum suppression (NMS).
Latency Estimate
Operating Frequency
Operating Mode Filter Size
(MHz) Max (ms)
Gaussian Filter
The GaussianBlur function applies Gaussian blur on the input image. Gaussian filtering is done
by convolving each point in the input image with a Gaussian kernel.
-(x - μ x) 2 -(y - μ y) 2
+
2σ 2x 2σ 2y
G 0(x, y) = e
μ σ
Where μ x , y are the mean values and σ x , y are the variances in x and y directions
μ
respectively. In the GaussianBlur function, values of μ x , y are considered as zeroes and the
σ
values of σ x , y are equal.
API Syntax
template<int FILTER_SIZE, int BORDER_TYPE, int SRC_T, int ROWS, int COLS,
int NPC = 1>
void GaussianBlur(xf::Mat<SRC_T, ROWS, COLS, NPC> & src, xf::Mat<SRC_T,
ROWS, COLS, NPC> & dst, float sigma)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
FILTER_SIZE Filter size. Filter size of 3 (XF_FILTER_3X3), 5 (XF_FILTER_5X5) and 7 (XF_FILTER_7X7) are
supported.
BORDER_TYPE Border type supported is XF_BORDER_CONSTANT
SRC_T Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1
and XF_8UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible values are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixel operations respectively.
src Input image
dst Output image
sigma Standard deviation of Gaussian filter
Resource Utilization
The following table summarizes the resource utilization of the Gaussian Filter in different
configurations, generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-
es1 FPGA, to progress a grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Frequency
Operating Mode Filter Size
BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization of the Gaussian Filter in different
configurations, generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-
es1 FPGA, to progress a 4K 3 Channel image.
Utilization Estimate
Operating
Frequency
Operating Mode Filter Size
BRAM_18K DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes a performance estimate of the Gaussian Filter in different
configurations, as generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA,
to process a grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode Filter Size
Max Latency (ms)
Gradient Magnitude
The magnitude function computes the magnitude for the images. The input images are x-
gradient and y-gradient images of type 16S. The output image is of same type as the input image.
For L1NORM normalization, the magnitude computed image is the pixel-wise added image of
absolute of x-gradient and y-gradient, as shown below:.
g = |g x| + g y | |
For L2NORM normalization, the magnitude computed image is as follows:
⎛ 2
g= ⎝g x + g 2y⎞⎠
API Syntax
template< int NORM_TYPE ,int SRC_T,int DST_T, int ROWS, int COLS,int NPC=1>
void magnitude(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_matx,xf::Mat<DST_T,
ROWS, COLS, NPC> & _src_maty,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_mat)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
NORM_TYPE Normalization type can be either L1 or L2 norm. Values are XF_L1NORM or XF_L2NORM
SRC_T Input pixel type. Only 16-bit, signed, 1 channel is supported (XF_16SC1)
DST_T Output pixel type. Only 16-bit, signed,1 channel is supported (XF_16SC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible values are XF_NPPC1 and XF_NPPC8 for 1 pixel
and 8 pixel operations respectively.
_src_matx First input, x-gradient image.
_src_maty Second input, y-gradient image.
_dst_mat Output, magnitude computed image.
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image and for L2 normalization.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 2 16
FF 707 2002
LUT 774 3666
CLB 172 737
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image and for L2 normalization.
Latency Estimate
Operating Mode Operating Frequency (MHz)
Max (ms)
Gradient Phase
The phase function computes the polar angles of two images. The input images are x-gradient
and y-gradient images of type 16S. The output image is of same type as the input image.
For radians:
For degrees:
API Syntax
template<int RET_TYPE ,int SRC_T,int DST_T, int ROWS, int COLS,int NPC=1 >
void phase(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_matx,xf::Mat<DST_T, ROWS,
COLS, NPC> & _src_maty,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_mat)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
RET_TYPE Output format can be either in radians or degrees. Options are XF_RADIANS or XF_DEGREES.
• If the XF_RADIANS option is selected, phase API will return result in Q4.12 format. The output
range is (0, 2 pi).
• If the XF_DEGREES option is selected, xFphaseAPI will return result in Q10.6 degrees and
output range is (0, 360).
Parameter Description
SRC_T Input pixel type. Only 16-bit, signed, 1 channel is supported (XF_16SC1).
DST_T Output pixel type. Only 16-bit, signed, 1 channel is supported (XF_16SC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src_matx First input, x-gradient image.
_src_maty Second input, y-gradient image.
_dst_mat Output, phase computed image.
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 6 24
DSP48E 6 19
FF 873 2396
LUT 753 3895
CLB 185 832
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
In phase implementation, the output is returned in a fixed point format. If XF_RADIANS option is
selected, phase API will return result in Q4.12 format. The output range is (0, 2 pi). If
XF_DEGREES option is selected, phase API will return result in Q10.6 degrees and output range
is (0, 360).
Where:
Since we are looking for windows with corners, we are looking for windows with a large variation
in intensity. Hence, we have to maximize the equation above, specifically the term:
⎡
⎣I(x + u, y + v) - I(x, y)⎤⎦ 2
E(u, v) = ∑ u 2I 2x + 2uvI x I y + v 2 I 2y
⎛ I 2x I x I y ⎞ u
E(u, v) = [u v]⎜∑ w(x, y)⟦ ⟧ ⎟⟦v ⟧
⎝ I x I y I 2y ⎠
u
E(u, v) = [u v]M⟦v ⟧
A score is calculated for each window, to determine if it can possibly contain a corner:
R = det(M) - k(trace(M)) 2
Where,
• det(M) = λ 1 λ 2
• trace(M) = λ 1 + λ 2
Non-Maximum Suppression:
In this case, consider a 3x3 neighborhood across the center pixel. If the center pixel is greater
than the surrounding pixel, then it is considered a corner. The comparison is made with the
surrounding pixels, which are within the radius.
Radius = 1
Threshold:
A threshold=442, 3109 and 566 is used for 3x3, 5x5, and 7x7 filters respectively. This threshold
is verified over 40 sets of images. The threshold can be varied, based on the application. The
corners are marked in the output image. If the corner is found in a particular location, that
location is marked with 255, otherwise it is zero.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
TYPE Input pixel type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input image.
COLS Maximum width of input image (must be multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
USE_URAM Enable to map some storage structures to URAM
src Input image
dst Output image.
threshold Threshold applied to the corner measure.
k Harris detector parameter
Resource Utilization
The following table summarizes the resource utilization of the Harris corner detection in different
configurations, generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-
es1 FPGA, to process a grayscale HD (1080x1920) image.
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =1.
Table 249: Resource Utilization Summary - For Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 33 66
DSP48E 10 80
FF 3254 9330
LUT 3522 13222
CLB 731 2568
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =1.
Table 250: Resource Utilization Summary - Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 45 90
Table 250: Resource Utilization Summary - Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =1 (cont'd)
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
DSP48E 10 80
FF 5455 12459
LUT 5675 24594
CLB 1132 4498
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =1.
Table 251: Resource Utilization Summary - Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 57 114
DSP48E 10 80
FF 8783 16593
LUT 9157 39813
CLB 1757 6809
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =1.
Table 252: Resource Utilization Summary - Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 200 MHz
BRAM_18K 35 70
DSP48E 10 80
FF 4656 11659
LUT 4681 17394
CLB 1005 3277
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =1.
Table 253: Resource Utilization Summary - Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 47 94
DSP48E 10 80
FF 6019 14776
LUT 6337 28795
CLB 1353 5102
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =1.
Table 254: Resource Utilization Summary - Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 59 118
DSP48E 10 80
FF 9388 18913
LUT 9414 43070
CLB 1947 7508
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =1.
Table 255: Resource Utilization Summary - Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 37 74
DSP48E 11 88
FF 6002 13880
LUT 6337 25573
CLB 1327 4868
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =1.
Table 256: Resource Utilization Summary - Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 49 98
DSP48E 11 88
FF 7410 17049
LUT 8076 36509
CLB 1627 6518
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =1.
Table 257: Resource Utilization Summary - Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 61 122
DSP48E 11 88
FF 10714 21137
LUT 11500 51331
CLB 2261 8863
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =2.
Table 258: Resource Utilization Summary - Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 41 82
DSP48E 10 80
FF 5519 10714
LUT 5094 16930
CLB 1076 3127
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =2.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 53 106
DSP48E 10 80
FF 6798 13844
LUT 6866 28286
CLB 1383 4965
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =2.
Table 260: Resource Utilization Summary - Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 65 130
DSP48E 10 80
FF 10137 17977
LUT 10366 43589
CLB 1940 7440
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =2.
Table 261: Resource Utilization Summary - Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 43 86
DSP48E 10 80
FF 5957 12930
LUT 5987 21187
CLB 1244 3922
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =2.
Table 262: Resource Utilization Summary - Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 55 110
DSP48E 10 80
FF 5442 16053
LUT 6561 32377
CLB 1374 5871
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =2.
Table 263: Resource Utilization Summary - Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 67 134
DSP48E 10 80
FF 10673 20190
LUT 10793 46785
CLB 2260 8013
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =2.
Table 264: Resource Utilization Summary - Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 45 90
DSP48E 11 88
FF 7341 15161
LUT 7631 29185
CLB 1557 5425
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =2.
Table 265: Resource Utilization Summary - Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 57 114
DSP48E 11 88
FF 8763 18330
LUT 9368 40116
CLB 1857 7362
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =2.
Table 266: Resource Utilization Summary - Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 69 138
DSP48E 11 88
FF 12078 22414
LUT 12831 54652
CLB 2499 9628
The following table summarizes the resource utilization of the Harris corner detection in different
configurations, generated using SDx 2019.1 version tool for the xczu7ev-ffvc1156-2-e FPGA, to
process a grayscale 4K (3840X2160) image.
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =1.
Table 267: Resource Utilization Summary - For Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 4 21
Table 267: Resource Utilization Summary - For Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =1 (cont'd)
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
DSP48E 10 80
FF 5306 11846
LUT 3696 13846
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =1.
Table 268: Resource Utilization Summary - Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 30
DSP48E 10 80
FF 7625 13899
LUT 5596 27136
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =1.
Table 269: Resource Utilization Summary - Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 42
DSP48E 10 80
FF 12563 19919
LUT 8816 39087
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =1.
Table 270: Resource Utilization Summary - Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 4 23
DSP48E 10 80
FF 6689 15022
LUT 4506 18719
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =1.
Table 271: Resource Utilization Summary - Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 32
DSP48E 10 80
FF 9050 17063
LUT 6405 31992
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =1.
Table 272: Resource Utilization Summary - Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 44
DSP48E 10 80
FF 13946 23116
LUT 9626 44738
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =1.
Table 273: Resource Utilization Summary - Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 4 25
DSP48E 11 88
FF 8338 17378
LUT 6151 24844
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =1.
Table 274: Resource Utilization Summary - Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 34
DSP48E 11 88
FF 10497 19457
LUT 7858 39762
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =1.
Table 275: Resource Utilization Summary - Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =1
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 12 12
URAM 7 46
DSP48E 11 88
FF 15393 25450
LUT 11080 50662
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =2.
Table 276: Resource Utilization Summary - Sobel Filter = 3, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 4 21
DSP48E 10 80
FF 6286 13441
LUT 4704 18072
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=5 and
NMS_RADIUS =2.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 30
DSP48E 10 80
FF 8626 15498
LUT 6606 31371
The following table summarizes the resource utilization for Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =2.
Table 278: Resource Utilization Summary - Sobel Filter = 3, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 42
DSP48E 10 80
FF 13543 21522
LUT 9853 43301
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =2.
Table 279: Resource Utilization Summary - Sobel Filter = 5, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 4 23
DSP48E 10 80
FF 7670 16750
LUT 5513 22854
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =2.
Table 280: Resource Utilization Summary - Sobel Filter = 5, Box filter=5 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 32
DSP48E 10 80
FF 9712 18793
LUT 7338 36136
The following table summarizes the resource utilization for Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =2.
Table 281: Resource Utilization Summary - Sobel Filter = 5, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 44
DSP48E 10 80
FF 14650 24846
LUT 10558 48866
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =2.
Table 282: Resource Utilization Summary - Sobel Filter = 7, Box filter=3 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 4 25
DSP48E 11 88
FF 9562 19101
LUT 7405 29986
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =2.
Table 283: Resource Utilization Summary - Sobel Filter = 7, Box filter=5 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 34
DSP48E 11 88
FF 11751 21180
LUT 9254 44024
The following table summarizes the resource utilization for Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =2.
Table 284: Resource Utilization Summary - Sobel Filter = 7, Box filter=7 and
NMS_RADIUS =2
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 20 20
URAM 7 46
DSP48E 11 88
FF 16723 27156
LUT 12474 54858
Performance Estimate
The following table summarizes a performance estimate of the Harris corner detection in
different configurations, as generated using Vivado HLS 2019.1 tool for Xilinx Xczu9eg-
ffvb1156-1-i-es1 FPGA, to process a grayscale HD (1080x1920) image.
In xfOpenCV thresholding and NMS are included, but in OpenCV they are not included. In
xfOpenCV, all the blocks are implemented in fixed point. Whereas, in OpenCV, all the blocks are
implemented in floating point.
Histogram Computation
The calcHist function computes the histogram of given input image.
H ⎡⎣src(x, y)⎤⎦ = H ⎡⎣src(x, y)⎤⎦ + 1
Where, H is the array of 256 elements.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and XF_8UC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle
_src Input image
histogram Output array of 256 elements
Resource Utilization
The following table summarizes the resource utilization of the calcHist function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz for 1 pixel case and at
150 MHz for 8 pixel mode.
Resource Utilization
Name
Normal Operation (1 pixel) Resource Optimized (8 pixel)
BRAM_18K 2 16
DSP48E 0 0
FF 196 274
LUT 240 912
CLB 57 231
The following table summarizes the resource utilization of the calcHist function for Normal
Operation (1 pixel), generated using Vivado HLS 2019.1 version tool for the Xczu9eg-
ffvb1156-1-i-es1 FPGA at 300 MHz for 1 pixel case for 4K image 3channel .
Resource Utilization
Name
Normal Operation (1 pixel)
BRAM_18K 8
DSP48E 0
FF 381
LUT 614
CLB 134
Performance Estimate
The following table summarizes a performance estimate of the calcHist function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz for 1 pixel and 150
MHz for 8 pixel mode.
Latency Estimate
Operating Mode
Max (ms)
1 pixel 6.9
Latency Estimate
Operating Mode
Max (ms)
8 pixel 1.7
Histogram Equalization
The equalizeHist function performs histogram equalization on input image or video. It
improves the contrast in the image, to stretch out the intensity range. This function maps one
distribution (histogram) to another distribution (a wider and more uniform distribution of
intensity values), so the intensities are spread over the whole range.
H'[i] = ∑ H ⎡⎣ j⎤⎦
0≤ j<i
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and output pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle
_src Input image
_src1 Input image
_dst Output image
Resource Utilization
The following table summarizes the resource utilization of the equalizeHist function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz for 1 pixel and 150
MHz for 8 pixel mode.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
Performance Estimate
The following table summarizes a performance estimate of the equalizeHist function for Normal
Operation (1 pixel) and Resource Optimized (8 pixel) configurations, generated using Vivado HLS
2019.1version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz for 1 pixel and 150
MHz for 8 pixel mode.
Latency Estimate
Operating Mode
Max (ms)
HOG
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision for the
purpose of object detection. The feature descriptors produced from this approach is widely used
in the pedestrian detection.
The technique counts the occurrences of gradient orientation in localized portions of an image.
HOG is computed over a dense grid of uniformly spaced cells and normalized over overlapping
blocks, for improved accuracy. The concept behind HOG is that the object appearance and shape
within an image can be described by the distribution of intensity gradients or edge direction.
Both RGB and gray inputs are accepted to the function. In the RGB mode, gradients are
computed for each plane separately, but the one with the higher magnitude is selected. With the
configurations provided, the window dimensions are 64x128, block dimensions are 16x16.
API Syntax
Parameter Descriptions
Parameters Description
WIN_HEIGHT The number of pixel rows in the window. This must be a multiple of 8 and should not exceed the
number of image rows.
WIN_WIDTH The number of pixel cols in the window. This must be a multiple of 8 and should not exceed the
number of image columns.
WIN_STRIDE The pixel stride between two adjacent windows. It is fixed at 8.
BLOCK_HEIGHT Height of the block. It is fixed at 16.
BLOCK_WIDTH Width of the block. It is fixed at 16.
CELL_HEIGHT Number of rows in a cell. It is fixed at 8.
CELL_WIDTH Number of cols in a cell. It is fixed at 8.
NOB Number of histogram bins for a cell. It is fixed at 9
DESC_SIZE The size of the output descriptor.
IMG_COLOR The type of the image, set as either XF_GRAY or XF_RGB
OUTPUT_VARIENT Must be either XF_HOG_RB or XF_HOG_NRB
SRC_T Input pixel type. Must be either XF_8UC1 or XF_8UC4, for gray and color respectively.
DST_T Output descriptor type. Must be XF_32UC1.
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed.
NPC Number of pixels to be processed per cycle; this function supports only XF_NPPC1 or 1 pixel per
cycle operations.
USE_URAM Enable to map UltraRAM instead of BRAM for some storage structures.
Parameters Description
Where,
Note: In the RB mode, the block data is written to the memory taking the overlap windows into
consideration. In the NRB mode, the block data is written directly to the output stream without
consideration of the window overlap. In the host side, the overlap must be taken care.
Resource Utilization
The following table shows the resource utilization of HOGDescriptor function for normal
operation (1 pixel) mode as generated in Vivado HLS 2019.1 version tool for the part Xczu9eg-
ffvb1156-1-i-es1 at 300 MHz to process an image of 1920x1080 resolution.
The following table shows the resource utilization of HOGDescriptor function for normal
operation (1 pixel) mode as generated in SDx 2019.1 version tool for the part xczu7ev-
ffvc1156-2-e at 300 MHz to process an image of 1920x1080 resolution with UltraRAM enabled.
BRAM_18K 10 12 18 20
URAM 15 15 15 17
DSP48E 34 46 36 48
FF 17285 17917 18270 18871
LUT 12409 12861 12793 13961
Performance Estimate
The following table shows the performance estimates of HOGDescriptor() function for different
configurations as generated in Vivado HLS 2019.1 version tool for the part Xczu9eg-ffvb1156-1-
i-es1 to process an image of 1920x1080p resolution.
Latency Estimate
Operating Mode Operating Frequency (MHz)
Min (ms) Max (ms)
1. Border care
The border care that OpenCV has taken in the gradient computation is
BORDER_REFLECT_101, in which the border padding will be the neighboring pixels'
reflection. Whereas, in the Xilinx implementation, BORDER_CONSTANT (zero padding) was
used for the border care.
2. Gaussian weighing
The Gaussian weights are multiplied on the pixels over the block, that is a block has 256
pixels, and each position of the block are multiplied with its corresponding Gaussian weights.
Whereas, in the HLS implementation, gaussian weighing was not performed.
3. Cell-wise interpolation
The magnitude values of the pixels are distributed across different cells in the blocks but on
the corresponding bins.
Pixels in the region 1 belong only to its corresponding cells, but the pixels in region 2 and 3
are interpolated to the adjacent 2 cells and 4 cells respectively. This operation was not
performed in the HLS implementation.
4. Output handling
The output of the OpenCV will be in the column major form. In the HLS implementation,
output will be in the row major form. Also, the feature vector will be in the fixed point type
Q0.16 in the HLS implementation, while in the OpenCV it will be in floating point.
Limitations
HoughLines
The HoughLines function here is equivalent to HoughLines Standard in OpenCV. The
HoughLines function is used to detect straight lines in a binary image. To apply the Hough
transform, edge detection preprocessing is required. The input to the Hough transform is an edge
detected binary image. For each point (xi,yi) in a binary image, we define a family of lines that go
through the point as:
1 N. Dalal, B. Triggs: Histograms of oriented gradients for human detection, IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2005.
Each pair of (rho,theta) represents a line that passes through the point (xi,yi). These (rho,theta)
pairs of this family of lines passing through the point form a sinusoidal curve in (rho,theta) plane.
If the sinusoids of N different points intersect in the (rho,theta) plane, then that intersection
(rho1, theta1) represents the line that passes through these N points. In the HoughLines
function, an accumulator is used to keep the count (also called voting) of all the intersection
points in the (rho,theta) plane. After voting, the function filters spurious lines by performing
thinning, that is, checking if the center vote value is greater than the neighborhood votes and
threshold, then making that center vote as valid and other wise making it zero. Finally, the
function returns the desired maximum number of lines (LINESMAX) in (rho,theta) form as output.
The design assumes the origin at the center of the image i.e at (Floor(COLS/2), Floor(ROWS/2)).
The ranges of rho and theta are:
For ease of use, the input angles THETA, MINTHETA and MAXTHETA are taken in degrees, while
the output theta is in radians. The angle resolution THETA is declared as an integer, but treated
as a value in Q6.1 format (that is, THETA=3 signifies that the resolution used in the function is
1.5 degrees). When the output (rho, Ɵ theta) is used for drawing lines, you should be aware of
the fact that origin is at the center of the image.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
SRC_T Input Pixel Type. Only 8-bit, unsigned, 1-channel is supported (XF_8UC1).
ROWS Maximum height of input image
COLS Maximum width of input image
NPC Number of Pixels to be processed per cycle; Only single pixel supported XF_NPPC1.
_src_mat Input image should be 8-bit, single-channel binary image.
outputrho Output array of rho values. rho is the distance from the coordinate origin (center of the image).
outputtheta Output array of theta values. Theta is the line rotation angle in radians.
threshold Accumulator threshold parameter. Only those lines are returned that get enough votes (>threshold).
linesmax Maximum number of lines.
Resource Utilization
The table below shows the resource utilization of the kernel for different configurations,
generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 to process a
grayscale HD (1080x1920) image for 512 lines.
Resource Utilization
Name
THETA=1, RHO=1
BRAM_18K 542
DSP48E 10
FF 60648
LUT 56131
Performance Estimate
The following table shows the performance of kernel for different configurations, generated
using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1 to process a grayscale
HD (1080x1920) image for 512 lines.
Latency Estimate
Operating Mode Operating Frequency (MHz)
Max (ms)
Pyramid Up
The pyrUp function is an image up-sampling algorithm. It first inserts zero rows and zero
columns after every input row and column making up to the size of the output image. The output
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
TYPE Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
ROWS Maximum Height or number of output rows to build the hardware for this kernel
COLS Maximum Width or number of output columns to build the hardware for this kernel
NPC Number of pixels to process per cycle. Currently, the kernel supports only 1 pixel per cycle
processing (XF_NPPC1).
_src Input image stream
_dst Output image stream
Resource Utilization
The following table summarizes the resource utilization of pyrUp for 1 pixel per cycle
implementation, for a maximum input image size of 1920x1080 pixels. The results are after
synthesis in Vivado HLS 2019.1 for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz.
Utilization Estimate
Operating
Frequency
Operating Mode
LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of pyrUp for 1 pixel per cycle
implementation, for a maximum input image size of 4K with BGR. The results are after synthesis
in Vivado HLS 2019.1 for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz.
Utilization Estimate
Operating
Frequency
Operating Mode
LUTs FFs DSPs BRAMs
(MHz)
Performance Estimate
The following table summarizes performance estimates of pyrUp function on Vivado HLS 2019.1
for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency
Operating Mode Input Image Size
(MHz) Max (ms)
Pyramid Down
The pyrDown function is an image down-sampling algorithm which smoothens the image before
down-scaling it. The image is smoothened using a Gaussian filter with the following kernel:
⎡1 4 6 4 1⎤
⎢4 16 24 16 4⎥
1 ⎢6 ⎥
256 ⎢
24 36 24 6⎥
⎢4 16 24 16 4⎥
⎣1 4 6 4 1⎦
Down-scaling is performed by dropping pixels in the even rows and the even columns. The
⎛rows + 1 columns + 1 ⎞
⎝ 2 2 ⎠
resulting image size is .
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
TYPE Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
ROWS Maximum Height or number of input rows to build the hardware for this kernel
COLS Maximum Width or number of input columns to build the hardware for this kernel
NPC Number of pixels to process per cycle. Currently, the kernel supports only 1 pixel per cycle
processing (XF_NPPC1).
USE_URAM Enable to map storage structures to UltraRAM
_src Input image stream
_dst Output image stream
Resource Utilization
The following table summarizes the resource utilization of pyrDown for 1 pixel per cycle
implementation, for a maximum input image size of 1920x1080 pixels. The results are after
synthesis in Vivado HLS 2019.1 for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz.
Utilization Estimate
Operating
Frequency
Operating Mode
LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of pyrDown for 1 pixel per cycle
implementation, for a maximum input image size of 4Kwith BGR image. The results are after
synthesis in Vivado HLS 2019.1 for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA at 300 MHz.
Utilization Estimate
Operating
Frequency
Operating Mode
LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of pyrDown for 1 pixel per cycle
implementation, for a maximum input image size of 3840x2160 pixels. The results are after
synthesis in SDx 2019.1 for the Xilinx xczu7eg-ffvb1156-1 FPGA at 300 MHz with UltraRAM
enabled.
Table 308: pyrDown Function Resource Utilization Summary with UltraRAM Enabled
Utilization Estimate
Operating
Operating Frequency
Mode LUTs FFs DSPs BRAMs URAM
(MHz)
Performance Estimate
The following table summarizes performance estimates of pyrDown function in Vivado HLS
2019.1 for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency
Operating Mode Input Image Size
(MHz) Max (ms)
InitUndistortRectifyMapInverse
The InitUndistortRectifyMapInverse function generates mapx and mapy, based on a set
of camera parameters, where mapx and mapy are inputs for the xf::remap function. That is, for
each pixel in the location (u, v) in the destination (corrected and rectified) image, the function
computes the corresponding coordinates in the source image (the original image from camera).
The InitUndistortRectifyMapInverse module is optimized for hardware, so the inverse of rotation
matrix is computed outside the synthesizable logic. Note that the inputs are fixed point, so the
floating point camera parameters must be type casted to Q12.20 format.
API Syntax
template< int CM_SIZE, int DC_SIZE, int MAP_T, int ROWS, int COLS, int NPC >
void InitUndistortRectifyMapInverse ( ap_fixed<32,12> *cameraMatrix,
ap_fixed<32,12> *distCoeffs, ap_fixed<32,12> *ir, xf::Mat<MAP_T, ROWS,
COLS, NPC> &_mapx_mat, xf::Mat<MAP_T, ROWS, COLS, NPC> &_mapy_mat, int
_cm_size, int _dc_size)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
InRange
The InRange function checks if pixels in the image src lie between the given boundaries. dst(x,y)
is set to 255, if src(x,y) is within the specified thresholds and otherwise 0.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input Pixel Type. 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and XF_8UC3).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. In case of N-pixel parallelism, width should be
multiple of N.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixel operations respectively.
src Input image
dst Output image
lower_thresh Lower threshold value
upper_thresh Upper threshold value
Resource Utilization
The following table summarizes the resource utilization of the InRange function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 86 154
LUT 60 148
CLB 15 37
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Integral Image
The integral function computes an integral image of the input. Each output pixel is the sum of
all pixels above and to the left of itself.
dst(x, y) = sum(x, y) = sum(x, y) + sum⎛⎝x - 1, y⎞⎠ + sum⎛⎝x, y - 1⎞⎠ - sum⎛⎝x - 1, y - 1⎞⎠
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_TYPE Input pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
DST_TYPE Output pixel type. Only 32-bit,unsigned,1 channel is supported(XF_32UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image
NPC Number of pixels to be processed per cycle; this function supports only XF_NPPC1 or 1 pixel per
cycle operations.
_src_mat Input image
_dst_mat Output image
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a grayscale HD (1080x1920) image.
Resource Utilization
Name 1 pixel
300 MHz
BRAM_18K 4
DSP48E 0
FF 613
LUT 378
CLB 102
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Latency Estimate
• Pixel intensities of an object do not have too many variations in consecutive frames
• Neighboring pixels have similar motion
Consider a pixel I(x, y, t) in first frame. (Note that a new dimension, time, is added here. When
working with images only, there is no need of time). The pixel moves by distance (dx, dy) in the
next frame taken after time dt. Thus, since those pixels are the same and the intensity does not
change, the following is true:
I(x, y, t) = I ⎛⎝x + dx, y + dy, t + dt⎞⎠
Taking the Taylor series approximation on the right-hand side, removing common terms, and
dividing by dt gives the following equation:
f xu + f yv + f t = 0
δf δf dy
fx= f = u = dx v=
Where δx , y δx , dt and dt .
The above equation is called the Optical Flow equation, where, fx and fy are the image
gradientsand ft is the gradient along time. However, (u, v) is unknown. It is not possible to solve
this equation with two unknown variables. Thus, several methods are provided to solve this
problem. One method is Lucas-Kanade. Previously it was assumed that all neighboring pixels
have similar motion. The Lucas-Kanade method takes a patch around the point, whose size can
be defined through the ‘WINDOW_SIZE’ template parameter. Thus, all the points in that patch
have the same motion. It is possible to find (fx, fy, ft ) for these points. Thus, the problem now
becomes solving ‘WINDOW_SIZE * WINDOW_SIZE’ equations with two unknown
variables,which is over-determined. A better solution is obtained with the “least square fit”
method. Below is the final solution, which is a problem with two equations and two unknowns:
⎡ ∑ f 2x ∑ f x f y ⎤-1⎡- ∑ f x f t ⎤
⎣v ⎦ = ⎢
⎡u⎤ i i i
⎥ ⎢ i i
⎥
⎣ ∑ f x i f y i ∑ f y i ⎦ ⎣- ∑ f y i f t i⎦
2
This solution fails when a large motion is involved and so pyramids are used. Going up in the
pyramid, small motions are removed and large motions become small motions and so by applying
Lucas-Kanade, the optical flow along with the scale is obtained.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
NUM_PYR_LEVELS Number of Image Pyramid levels used for the optical flow computation
NUM_LINES Number of lines to buffer for the remap algorithm – used to find the temporal gradient
WINSIZE Window Size over which Optical Flow is computed
Parameter Description
FLOW_WIDTH, Data width and number of integer bits to define the signed flow vector data type. Integer bit
FLOW_INT includes the signed bit.
The default type is 16-bit signed word with 10 integer bits and 6 decimal bits.
TYPE Pixel type of the input image. XF_8UC1 is only the supported value.
ROWS Maximum height or number of rows to build the hardware for this kernel
COLS Maximum width or number of columns to build the hardware for this kernel
NPC Number of pixels the hardware kernel must process per clock cycle. Only XF_NPPC1, 1 pixel per
cycle, is supported.
USE_URAM Enable to map some storage structures to UltraRAM
_curr_img First input image stream
_next_img Second input image to which the optical flow is computed with respect to the first image
_streamFlowin 32-bit Packed U and V flow vectors input for optical flow. The bits from 31-16 represent the flow
vector U while the bits from 15-0 represent the flow vector V.
_streamFlowout 32-bit Packed U and V flow vectors output after optical flow computation. The bits from 31-16
represent the flow vector U while the bits from 15-0 represent the flow vector V.
level Image pyramid level at which the algorithm is currently computing the optical flow.
scale_up_flag Flag to enable the scaling-up of the flow vectors. This flag is set at the host when switching from
one image pyramid level to the other.
scale_in Floating point scale up factor for the scaling-up the flow vectors.
The value is (previous_rows-1)/(current_rows-1). This is not 1 when switching from one image
pyramid level to the other.
init_flag Flag to initialize flow vectors to 0 in the first iteration of the highest pyramid level. This flag must
be set in the first iteration of the highest pyramid level (smallest image in the pyramid). The flag
must be unset for all the other iterations.
Resource Utilization
The following table summarizes the resource utilization of densePyrOpticalFlow for 1 pixel per
cycle implementation, with the optical flow computed for a window size of 11 over an image size
of 1920x1080 pixels. The results are after implementation in Vivado HLS 2019.1 for the Xilinx
xczu9eg-ffvb1156-2L-e FPGA at 300 MHz.
Utilization Estimate
Operating
Frequency
Operating Mode
LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of densePyrOpticalFlow for 1 pixel per
cycle implementation, with the optical flow computed for a window size of 11 over an image size
of 3840X2160 pixels. The results are after implementation in SDx 2019.1 for the Xilinx xczu7ev-
ffvc1156-2 FPGA at 300 MHz with UltraRAM enabled.
Utilization Estimate
Operating
Operating Frequency
Mode LUTs FFs DSPs BRAMs URAM
(MHz)
Performance Estimate
The following table summarizes performance figures on hardware for the densePyrOpticalFlow
function for 5 iterations over 5 pyramid levels scaled down by a factor of two at each level. This
has been tested on the zcu102 evaluation board.
Latency Estimate
Operating Frequency
Operating Mode Image Size
(MHz) Max (ms)
• Pixel intensities of an object do not have too many variations in consecutive frames
• Neighboring pixels have similar motion
Consider a pixel I(x, y, t) in first frame. (Note that a new dimension, time, is added here. When
working with images only, there is no need of time). The pixel moves by distance (dx, dy) in the
next frame taken after time dt. Thus, since those pixels are the same and the intensity does not
change, the following is true:
Taking the Taylor series approximation on the right-hand side, removing common terms, and
dividing by dt gives the following equation:
f xu + f yv + f t = 0
δf δf dy
fx= f = u = dx v=
Where δx , y δx , dt and dt .
The above equation is called the Optical Flow equation, where, fx and fy are the image
gradientsand ft is the gradient along time. However, (u, v) is unknown. It is not possible to solve
this equation with two unknown variables. Thus, several methods are provided to solve this
problem. One method is Lucas-Kanade. Previously it was assumed that all neighboring pixels
have similar motion. The Lucas-Kanade method takes a patch around the point, whose size can
be defined through the ‘WINDOW_SIZE’ template parameter. Thus, all the points in that patch
have the same motion. It is possible to find (fx, fy, ft ) for these points. Thus, the problem now
becomes solving ‘WINDOW_SIZE * WINDOW_SIZE’ equations with two unknown
variables,which is over-determined. A better solution is obtained with the “least square fit”
method. Below is the final solution, which is a problem with two equations and two unknowns:
⎡ ∑ f 2x ∑ f x f y ⎤-1⎡- ∑ f x f t ⎤
⎣v ⎦ = ⎢
⎡u⎤ i i i
⎥ ⎢ i i
⎥
⎣ ∑ f x i f y i ∑ f y i ⎦ ⎣- ∑ f y i f t i⎦
2
API Syntax
template<int TYPE, int ROWS, int COLS, int NPC, int WINDOW_SIZE,bool
USE_URAM=false>
void DenseNonPyrLKOpticalFlow (xf::Mat<TYPE, ROWS, COLS, NPC> & frame0,
xf::Mat<TYPE, ROWS, COLS, NPC> & frame1, xf::Mat<XF_32FC1, ROWS, COLS, NPC>
& flowx, xf::Mat<XF_32FC1, ROWS, COLS, NPC> & flowy)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Type pixel type. The current supported pixel value is XF_8UC1, unsigned 8 bit.
ROWS Maximum number of rows of the input image that the hardware kernel must be built for.
COLS Maximum number of columns of the input image that the hardware kernel must be built for.
NPC Number of pixels to process per cycle. Supported values are XF_NPPC1 (=1) and XF_NPPC2(=2).
WINDOW_SIZE Window size over which optical flow will be computed. This can be any odd positive integer.
USE_URAM Enable to map storage structures to UltraRAM.
Parameter Description
Resource Utilization
Utilization Estimate
Operating
Frequency
Operating Mode
BRAM_18K DSP_48Es FF LUTs
(MHz)
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K URAM DSP_48Es FF LUTs
(MHz)
Performance Estimate
Latency Estimate
Operating Frequency
Operating Mode
(MHz) Max (ms)
Kalman Filter
The classic Kalman Filter is proposed for linear system. The state-space description of a linear
system assumed to be:
xk + 1 = Ak xk + Bk uk + Γ k ξk
yk = H k xk + ηk
where xk is the state vector at kth time instant, constant (known) Ak is an nxn state transition
matrix, constant (known) Bk is an nxm control input matrix, constant (known) Γk is an nxp system
noise input matrix, constant (known) Hk is a qxn measurement matrix, constant (known) with 1≤
m, p, q ≤ n, {uk} a (known) sequence of m vectors (called a deterministic input sequence), and
⎧ ⎫
ξk
⎨
⎩
⎬
⎭
and
{η }
k are respectively, (unknown) system and observation noise sequences, with
known statistical information such as mean, variance, and covariance.
1.
⎨
⎩ ξk ⎬
⎭
and
{η k}
are assumed to be sequences of zero-mean Gaussian (or normal) white
E(η k) = 0, E⎛⎝ ξ k ξ Tl ⎞⎠ = Q k δ kl E⎛⎝ η k η Tl ⎞⎠ = R k δ kl
noise. That is, and , where δkl is a
Kronecker Delta function, and Qk and Rk are positive definite matrices, E(u) is an expectation
of random variable u.
E⎛⎝ξ k η Tl ⎞⎠ = 0 ∀ k, l
2.
⎧ ⎫
^x = ^x
The representation k| j k, j
The Kalman filter algorithm can be summarized as shown in the below equations: means the
estimate of x at time instant k using all the data measured till the time instant j.
Initialization
⎧P 0,0 = Var(x 0)
⎨ ^
⎩ x = E(x )
0|0 0
⎩ P k, k = ⎛⎝I - G k H k⎞⎠ P k, k - 1
Where P is an estimate error covariance nxn matrix, Gk is Kalman gain nxq matrix, and k=1, 2,..
Computation Strategy
The numerical accuracy of the Kalman filter covariance measurement update is a concern for
implementation, since it differentiates two positive definite arrays. This is a potential problem if
finite precision is used for computation. This design uses UDU factorization of P to address the
numerical accuracy/stability problems.
-1
P k, k = ⎛⎝I - G k H k⎞⎠ P k, k - 1 = P k, k - 1 - P k, k - 1 H k T ⎛⎝R k + H k P k, k - 1 H k T ⎞⎠ P k, k - 1
During the initialization (before the first iteration), the user has to supply error covariance matrix
P’s U0_mat and D0_mat matrices and system noise co-variance matrix Q’s Uq_mat and Dq_mat
matrices. These U and D matrices can be obtained using Backward Cholesky decomposition.
Below, we illustrate the Backward Cholesky decomposition of P into a unit upper triangular
matrix U and diagonal matrix D such that P=UDUT.
n
D jj = P jj - ∑ D kk U 2jk
k = j+1
⎧ 0, i > j
⎪
⎪ 1, i = j
Ui j = ⎨⎡ n ⎤
⎪⎢ P i j - ∑ D kk U ik U jk⎥
⎪⎣ k = j+1 ⎦
, i = j - 1, j - 2, . . . .1
⎩ D jj
//Control Flag
INIT_EN = 1; TIMEUPDATE_EN = 2; MEASUPDATE_EN = 4;
XOUT_EN_TU = 8; UDOUT_EN_TU = 16; XOUT_EN_MU = 32;
UDOUT_EN_MU = 64; EKF_MEM_OPT = 128;
//Load A_mat,B_mat,Uq_mat,Dq_mat,H_mat,X0_mat,U0_mat,D0_mat,R_mat
//Initialization
KalmanFilter(A_mat, B_mat, Uq_mat, Dq_mat, H_mat, X0_mat, U0_mat, D0_mat,
R_mat, u_mat, y_mat, Xout_mat, Uout_mat, Dout_mat, INIT_EN);
//Time Update
KalmanFilter(A_mat, B_mat, Uq_mat, Dq_mat, H_mat, X0_mat, U0_mat, D0_mat,
R_mat, u_mat, y_mat, Xout_mat, Uout_mat, Dout_mat, TIMEUPDATE_EN +
XOUT_EN_TU + UDOUT_EN_TU);
//Measurement Update
KalmanFilter(A_mat, B_mat, Uq_mat, Dq_mat, H_mat, X0_mat, U0_mat, D0_mat,
R_mat, u_mat, y_mat, Xout_mat, Uout_mat, Dout_mat, MEASUPDATE_EN +
XOUT_EN_MU + UDOUT_EN_MU);
}
API Syntax
template<int N_STATE, int M_MEAS, int C_CTRL, Int MTU, int MMU, bool
USE_URAM=0, bool EKF_EN=0, int TYPE, int NPC >
void KalmanFilter ( xf::Mat<TYPE, N_STATE, N_STATE, NPC> &A_mat,
#if KF_C!=0
xf::Mat<TYPE, N_STATE, C_CTRL, NPC> &B_mat,
#endif
xf::Mat<TYPE, N_STATE, N_STATE, NPC> &Uq_mat,
xf::Mat<TYPE, N_STATE, 1, NPC> &Dq_mat,
xf::Mat<TYPE, M_MEAS, N_STATE, NPC> &H_mat,
xf::Mat<TYPE, N_STATE, 1, NPC> &X0_mat,
xf::Mat<TYPE, N_STATE, N_STATE, NPC> &U0_mat,
xf::Mat<TYPE, N_STATE, 1, NPC> &D0_mat,
xf::Mat<TYPE, M_MEAS, 1, NPC> &R_mat,
#if KF_C!=0
Parameter Descriptions
0 Initialization enable
1 Time update enable
2 Measurement update enable
3 Xout enable for time update
4 Uout/Dout enable for time update
5 Xout enables for measurement update
6 Uout/Dout enable for measurement update
7 Read optimization (Uq_mat, Dq_mat, U0_mat, D0_mat and
R_mat) for Extended Kalman Filter
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using SDx 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1 FPGA.
Resource Utilization
N_STATE=128; C_CTRL=128; N_STATE=64; C_CTRL=64;
N_STATE=5; C_CTRL=1;
Name M_MEAS=128; MTU=24; M_MEAS=12;MTU=16;MMU=
M_MEAS=3;MTU=2;MMU=2
MMU=24 16
300 MHz 300 MHz 300 MHz
The following table shows the resource utilization of the kernel for a configuration with
USE_URAM enable, generated using SDx 2019.1 for the Xilinx xczu7ev-ffvc1156-2-e FPGA.
BRAM_18K 30
DSP48E 284
FF 99210
LUT 53939
URAM 11
Performance Estimate
The following table shows the performance of kernel for different configurations, as generated
using SDx 2019.1 tool for the Xilinx® Xczu9eg-ffvb1156-1, for one iteration. Latency estimate is
calculated by taking average latency of 100 iteration.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
The following table shows the performance of kernel for a configuration with UltraRAM enable,
as generated using SDx 2019.1 tool for the Xilinx xczu7ev-ffvc1156-2-e, for one iteration.
Latency estimate is calculated by taking average latency of 100 iteration.
x k + 1 = (x k) + T k(x k)ξ k
z k = h k(x k) + η k
Where fk and hk are valued functions with ranges in Rn and Rq, respectively. 1≤q≤n, and Tk a
matrix-valued function with range in RnxRq such that for each k the first order partial derivatives
of fk (xk) and hk (xk)with respect to all the components of xk are continuous. We consider zero-
⎧ ⎫
The real-time linearization process is carried out as shown in the following equations. In the lines
of the linear model, the initial estimate and predicted position are chosen to be:
^x = E⎛ x ⎞, ^x = f ⎛ ^x ⎞
0 ⎝ 0⎠ 1|0 0⎝ 0⎠
^x = ^x
Then, k k|k , consecutively, for k=1,2,…, use the predicted positions.
^x ⎛^ ⎞
= f x k|k - 1 k - 1 ⎝ k - 1⎠
Note:
⎡ f 1 (x k)⎤ ⎡x1⎤
⎢ k ⎥ ⎢ k⎥
f k(x k) = ⎢ ⋮ ⎥ xk = ⎢ ⋮ ⎥
⎣ f nk (x k)⎦ ⎣ x nk ⎦
1. , where , k is a time index and superscript is row index and
⎡ ∂ f 1 ( x k) ∂ f 1k ( x k) ⎤
⎢ k
… ⎥
⎢ ∂ x 1k ∂ x nk ⎥
⎡ ∂ f k ( x k) ⎤ ⎢ ⎥
⎣ ∂ x k ⎦ = ⎢ n⋮ …
n
⋮
⎥
⎢∂ f k ( x k) … ∂ f k ( nx k) ⎥
⎣ ∂ x 1k ∂ xk ⎦
T
2. R m is a space of column vectors x = [x 1 … .x m]
⎛ ⎞ ⎛ ⎞
T
= F k - 1 P k - 1|k - 1 F k - 1 T + T k - 1 ^x k - 1 Q k - 1 T k - 1⎝ ^x k - 1⎠
⎝ ⎠
⎛ ⎡ ⎛^ ⎞⎤ ⎞
⎜ ⎢ ∂ h k ⎝ x k|k - 1⎠⎥ ⎟ ⎛ ⎞
P k|k = ⎜I - G k ⎢ ⎥ ⎟P k|k - 1 = ⎝I - G k H k⎠P k|k - 1
∂ xk
⎝ ⎣ ⎦ ⎠
//Load F/B_mat/Uq_mat/Dq_mat/X0_mat/U0_mat/D0_mat
//Initialization
KalmanFilter (F, B_mat, Uq_mat, Dq_mat, H, fx, U0_mat, D0_mat, R_mat, hx,
y_mat, Xout_mat, Uout_mat, Dout_mat, initFlag);
//Time Update
KalmanFilter (F, B_mat, Uq_mat, Dq_mat, H, fx, U0_mat, D0_mat, R_mat, hx,
y_mat, Xout_mat, Uout_mat, Dout_mat, TIMEUPDATE_EN + XOUT_EN_TU +
UDOUT_EN_TU);
for(int index=0; index< M_MEAS; index++)
{
if(iteration ==0)
// update hx/H using X0_mat for one measurement at a time
model_hxH(X0_mat, hx, H, index);
else
//Load R_mat
R_mat.write_float(0,R_matrix[index][index]);
//Load y_mat
Y_mat.write_float(0,measurement_vector[index]);
//Measurement Update
KalmanFilter (F, B_mat, Uq_mat, Dq_mat, H, fx, U0_mat, D0_mat, R_mat, hx,
y_mat, Xout_mat, Uout_mat, Dout_mat, MEASUPDATE_EN + XOUT_EN_MU +
UDOUT_EN_MU);
}
}
height width
∑ ∑ src(x, y)
y=0 x=0
μ= ⎛
⎝width*height⎞⎠
height width
∑ ∑ (μ - src(x, y)) 2
y=0 x=0
σ= ⎛
width*height⎞⎠
⎝
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed. Must be a multiple of 8, for 8-pixel operation.
Parameter Description
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image
_mean 16-bit data pointer through which the computed mean of the image is returned.
_stddev 16-bit data pointer through which the computed standard deviation of the image is returned.
Resource Utilization
The following table summarizes the resource utilization of the meanStdDev function, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
The following table summarizes the resource utilization of the meanStdDev function, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
3Channel image.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency
Max
The Max function calculates the per-element maximum of two corresponding images src1, src2
and stores the result in dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Max function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 103 153
LUT 44 102
CLB 21 38
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
MaxS
The MaxS function calculates the maximum elements between src and given scalar value scl and
stores the result in dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the MaxS function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 162 43
LUT 103 104
CLB 32 20
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
API Syntax
template<int FILTER_SIZE, int BORDER_TYPE, int TYPE, int ROWS, int COLS,
int NPC>
void medianBlur (xf::Mat<TYPE, ROWS, COLS, NPC> & _src, xf::Mat<TYPE, ROWS,
COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
FILTER_SIZE Window size of the hardware filter for which the hardware kernel will be built. This can be any
odd positive integer greater than 1.
BORDER_TYPE The way in which borders will be processed in the hardware kernel. Currently, only
XF_BORDER_REPLICATE is supported.
TYPE Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed. Must be a multiple of 8, for 8-pixel
operation.
NPC Number of pixels to be processed in parallel. Options are XF_NPPC1 (for 1 pixel processing per
clock), XF_NPPC8 (for 8 pixel processing per clock
_src Input image.
_dst Output image.
Resource Utilization
The following table summarizes the resource utilization of the medianBlur function for
XF_NPPC1 and XF_NPPC8 configurations, generated using Vivado HLS 2019.1 version tool for
the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Utilization Estimate
Operating
Operating Frequency
FILTER_SIZE
Mode LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of the medianBlur function for
XF_NPPC1 with 3channel image as input, generated using Vivado HLS 2019.1 version tool for
the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Utilization Estimate
Operating
Operating Frequency
FILTER_SIZE
Mode LUTs FFs DSPs BRAMs
(MHz)
Performance Estimate
The following table summarizes performance estimates of medianBlur function on Vivado HLS
2019.1 version tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating
Frequency
Operating Mode FILTER_SIZE Input Image Size
Max (ms)
(MHz)
Min
The Min function calculates the per element minimum of two corresponding images src1, src2
and stores the result in dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Min function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 103 153
LUT 44 102
CLB 23 34
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
MinS
The MinS function calculates the minimum elements between src and given scalar value scl and
stores the result in dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the MinS function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 104 159
LUT 43 103
CLB 23 36
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
MinMax Location
The minMaxLoc function finds the minimum and maximum values in an image and location of
those values.
minVal = min
'
src⎛⎝x ', y '⎞⎠
0 ≤ x ≤ width
0 ≤ y ' ≤ height
maxVal = max
'
src⎛⎝x ', y '⎞⎠
0 ≤ x ≤ width
0 ≤ y ' ≤ height
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. 8-bit, unsigned, 1 channel (XF_8UC1), 16-bit, unsigned, 1 channel (XF_16UC1), 16-
bit, signed, 1 channel (XF_16SC1), 32-bit, signed, 1 channel (XF_32SC1) are supported.
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image
max_val Maximum value in the image, of type int.
min_val Minimum value in the image, of type int.
_minlocx x-coordinate location of the first minimum value.
_minlocy y-coordinate location of the first minimum value.
_maxlocx x-coordinate location of the first maximum value.
_maxlocy y-coordinate location of the first maximum value.
Resource Utilization
The following table summarizes the resource utilization of the minMaxLoc function, generated
using Vivado HLS 2019.1 tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency
Mean-shift algorithm is an iterative technique for locating the maxima of a density function. For
object tracking, the density function used is the weight image formed using color histograms of
the object to be tracked and the frame to be tested. By using the weighted histogram we are
taking spatial position into consideration unlike the normal histogram calculation. This function
will take input image pointer, top left and bottom right coordinates of the rectangular object,
frame number and tracking status as inputs and returns the centroid using recursive mean shift
approach.
API Syntax
template <int MAXOBJ, int MAXITERS, int OBJ_ROWS, int OBJ_COLS, int SRC_T,
int ROWS, int COLS, int NPC>
void MeanShift(xf::Mat<SRC_T, ROWS, COLS, NPC> &_in_mat, uint16_t* x1,
uint16_t* y1, uint16_t* obj_height, uint16_t* obj_width, uint16_t* dx,
uint16_t* dy, uint16_t* status, uint8_t frame_status, uint8_t no_objects,
uint8_t no_iters );
Parameter Description
Parameter Description
The following table summarizes the resource utilization of the MeanShift function for normal (1
pixel) configuration as generated in Vivado HLS 2019.1 release tool for the part xczu9eg-
ffvb1156-i-es1 at 300 MHz to process a RGB image of resolution,1920x1080, and for 10 objects
of size of 250x250 and 4 iterations.
Limitations
Otsu Threshold
Otsu threshold is used to automatically perform clustering-based image thresholding or the
reduction of a gray-level image to a binary image. The algorithm assumes that the image contains
two classes of pixels following bi-modal histogram (foreground pixels and background pixels), it
then calculates the optimum threshold separating the two classes.
Otsu method is used to find the threshold which can minimize the intra class variance which
separates two classes defined by weighted sum of variances of two classes.
t I
⎛ ⎞
w1 = ∑ p i ⎝ ⎠ w2 = ∑ p⎛⎝i⎞⎠
1 t+1
Otsu shows that minimizing the intra-class variance is the same as maximizing inter-class
variance
σ 2b = σ - σ 2w
σ 2b = w 1 w 2 (μ b - μ f ) 2
/ /
⎡ t ⎤ ⎡ I ⎤
μ b = ⎢∑ p(i)x(i)⎥ w1 μ f = ⎢∑ p(i)x(i) ⎥ w 2
⎣ 1 ⎦ ⎣t + 1 ⎦
,
Where, is the class mean.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image (must be a multiple of 8, for 8-pixel operation)
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1 pixel
and 8 pixel operations respectively.
_src_mat Input image
_thresh Output threshold value after the computation
Resource Utilization
The following table summarizes the resource utilization of the OtsuThreshold function, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Paintmask
The Paintmask function replace the pixel intensity value with given color value when mask is not
zero or the corresponding pixel from the input image.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Paintmask Resource optimized (8
pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool for the
Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 95 163
LUT 57 121
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
CLB 14 33
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Pixel-Wise Addition
The add function performs the pixel-wise addition between two input images and returns the
output image.
Where:
XF_CONVERT_POLICY_TRUNCATE: Results are the least significant bits of the output operand,
as if stored in two’s complement binary format in the size of its bit-depth.
API Syntax
template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC=1>
void add (
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> dst )
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 55 11
8 pixel 150 0 0 65 138 24
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process 4K
image with 3 channels.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Pixel-Wise Multiplication
The multiply function performs the pixel-wise multiplication between two input images and
returns the output image.
Where:
XF_CONVERT_POLICY_TRUNCATE: Results are the least significant bits of the output operand,
as if stored in two’s complement binary format in the size of its bit-depth.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
image with 3 channels.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Pixel-Wise Subtraction
The subtract function performs the pixel-wise subtraction between two input images and
returns the output image.
Where:
XF_CONVERT_POLICY_TRUNCATE: Results are the least significant bits of the output operand,
as if stored in two’s complement binary format in the size of its bit-depth.
API Syntax
template<int POLICY_TYPE int SRC_T, int ROWS, int COLS, int NPC=1>
void subtract (
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src1,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> src2,
xf::Mat<int SRC_T, int ROWS, int COLS, int NPC> dst )
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a
grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
1 pixel 300 0 0 62 53 11
8 pixel 150 0 0 59 13 21
The following table summarizes the resource utilization in different configurations, generated
using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a 4K
image with 3 channels.
Utilization Estimate
Operating
Operating Frequency
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency (ms)
Reduce
The Reduce function reduces the matrix to a vector by treating rows/cols as set of 1-D vectors
and performing specified operation on vectors until a single row/col is obtained.
API Syntax
template< int REDUCE_OP, int SRC_T , int DST_T, int ROWS, int COLS, int
ONE_D_HEIGHT, int ONE_D_WIDTH,int NPC=1> void reduce(xf::Mat<SRC_T, ROWS,
COLS, NPC> & _src_mat, xf::Mat<DST_T, ONE_D_HEIGHT, ONE_D_WIDTH, 1> &
_dst_mat, unsigned char dim)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
COLS Maximum width of input and output image. In case of N-pixel parallelism, width should be
multiple of N.
ONE_D_HEIGHT Height of output 1-D vector or reduced matrix
ONE_D_WIDTH Width of output 1-D vector or reduced matrix
NPC Number of pixels to be processed per cycle; possible option is XF_NPPC1 (1 pixel per cycle).
_src_mat Input image
_dst_mat 1-D vector
dim Dimension index along which the matrix is reduced. 0 means that the matrix is reduced to a
single row. 1 means that the matrix is reduced to a single column.
Resource Utilization
The following table summarizes the resource utilization of the Reduce function Normal mode(1
pixel) as generated using Vivado HLS 2019.1 version tool for the Xczu9eg-ffvb1156-1-i-es1
FPGA.
Resource Utilization
Name 1 pixel per clock operation
300 MHz
BRAM_18K 2
DSP48E 0
FF 288
LUT 172
CLB 54
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Remap
The remap function takes pixels from one place in the image and relocates them to another
position in another image. Two types of interpolation methods are used here for mapping the
image from source to destination image.
dst = src⎛⎝map x (x, y), map y (x, y)⎞⎠
API Syntax
Parameter Descriptions
Parameter Description
WIN_ROWS Number of input image rows to be buffered inside. Must be set based on the map data. For
instance, for left right flip, 2 rows are sufficient.
INTERPOLATION_TYPE Type of interpolation, either XF_INTERPOLATION_NN (nearest neighbor) or
XF_INTERPOLATION_BILINEAR (linear interpolation)
SRC_T Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
MAP_T Map type. Single channel float type. XF_32FC1.
DST_T Output image type. Grayscale image of type 8-bits and single channel. XF_8UC1.
ROWS Height of input and output images
COLS Width of input and output images
NPC Number of pixels to be processed per cycle; this function supports only XF_NPPC1 or 1 pixel per
cycle operations.
USE_URAM Enable to map some structures to UltraRAM instead of BRAM.
PARAMETERS DESCRIPTION
PARAMETERS DESCRIPTION
Resource Utilization
The following table summarizes the resource utilization of remap, for HD (1080x1920) images
generated in the Vivado HLS 2019.1 version tool for the Xilinx xczu9eg-ffvb1156-i-es1 FPGA at
300 MHz, with WIN_ROWS as 64 for the XF_INTERPOLATION_BILINEAR mode.
BRAM_18K 64
DSP48E 17
FF 1738
LUT 1593
CLB 360
The following table summarizes the resource utilization of remap, for 4K (3840x2160) images
generated in the SDx 2019.1 version tool for the Xilinx xczu7ev-ffvc1156 FPGA at 300 MHz,
with WIN_ROWS as 100 for the XF_INTERPOLATION_BILINEAR mode using UltraRAM .
Table 381: remap Function Resource Utilization Summary with UltraRAM Enabled
BRAM_18K 3
DSP48E 10
URAM 24
FF 3196
LUT 3705
Performance Estimate
The following table summarizes the performance of remap(), for HD (1080x1920) images
generated in the Vivado HLS 2019.1 version tool for the Xilinx xczu9eg-ffvb1156-i-es1 FPGA at
300 MHz, with WIN_ROWS as 64 for XF_INTERPOLATION_BILINEAR mode.
Note: Scaling factors greater than or equal to 0.25 are supported in down-scaling and values less than or
equal to 8 are supported for up-scaling.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
TYPE Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
SRC_ROWS Maximum Height of input image for which the hardware kernel would be built.
Parameter Description
SRC_COLS Maximum Width of input image for which the hardware kernel would be built (must be a
multiple of 8).
DST_ROWS Maximum Height of output image for which the hardware kernel would be built.
DST_COLS Maximum Width of output image for which the hardware kernel would be built (must be a
multiple of 8).
NPC Number of pixels to be processed per cycle. Possible options are XF_NPPC1 (1 pixel per cycle)
and XF_NPPC8 (8 pixel per cycle).
MAX_DOWN_SCALE Set to 2 for all 1 pixel modes, and for upscale in x direction. When down scaling in x direction in
8-pixel mode, please set this parameter to the next highest integer value of the down scale
factor i.e., if downscaling from 1920 columns to 1280 columns, set to 2. For 1920 to 640, set to 3.
_src Input Image
_dst Output Image
Resource Utilization
The following table summarizes the resource utilization of Resize function in Resource Optimized
(8 pixel) mode and Normal mode, as generated in the Vivado HLS 2019.1 tool for the Xilinx
xczu9eg-ffvb1156-2-i-es2 FPGA.
Utilization Estimate
Operating Mode 1 Pixel (at 300 MHz) 8 Pixel (at 150MHz)
IMAGESIZE LUTs FFs DSPs BRAMs IMAGESIZE LUTs FFs DSPs BRAMs
The following table summarizes the resource utilization of Resize function in Normal mode, as
generated in the Vivado HLS 2019.1 tool for the Xilinx xczu9eg-ffvb1156-2-i-es2 FPGA for
3channel image as input.
Utilization Estimate
Operating Mode 1 Pixel (at 300 MHz)
IMAGESIZE LUTs FFs DSPs BRAMs
Performance Estimate
The following table summarizes the performance estimation of Resize for various configurations,
as generated in the Vivado HLS 2019.1 tool for the xczu9eg-ffvb1156-2-i-es2 FPGA at 300 MHz
to resize a grayscale image from 1080x1920 to 480x640 (downscale); and to resize a grayscale
image from 1080x1920 to 2160x3840 (upscale). This table also shows the latencies obtained for
different interpolation types.
BGR2HSV
The BGR2HSV function converts the input image color space to HSV color space and returns the
HSV image as the output.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
convertScaleAbs
The convertScaleAbs function converts an input image src with optional linear
transformation, save the result as image dst.
dst(x,y)= src1(x,y)*scale+shift
API Syntax
template< int SRC_T,int DST_T, int ROWS, int COLS, int NPC = 1>
void convertScaleAbs(xf::Mat<SRC_T, ROWS, COLS, NPC> & src1, xf::Mat<DST_T,
ROWS, COLS, NPC> & dst,float scale, float shift)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1).
DST_T Output pixel type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1).
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. In case of N-pixel parallelism, width should be
multiple of N.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for
1 pixel and 8 pixel operations respectively.
src1 Input image
scale Scale factor
shift Delta/shift added to scaled value.
dst Output image
Resource Utilization
The following table summarizes the resource utilization of the convertScaleAbs function in
Resource optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1
version tool for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 10 38
FF 949 1971
LUT 1052 1522
CLB 218 382
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image...
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Scharr Filter
The Scharr function computes the gradients of input image in both x and y direction by
convolving the kernel with input image being processed.
• GradientX:
⎡ -3 0 3 ⎤
G x = ⎢-10 0 10⎥*I
⎣ -3 0 3 ⎦
• GradientY:
API Syntax
template<int BORDER_TYPE, int SRC_T,int DST_T, int ROWS, int COLS,int NPC=1>
void Scharr(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,xf::Mat<DST_T, ROWS,
COLS, NPC> & _dst_matx,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_maty)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a grayscale HD (1080x1920) image.
Resource Utilization
Name 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 3 6
DSP48E 0 0
FF 728 1434
LUT 812 2481
CLB 171 461
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a 4K 3 channel image.
Resource Utilization
Name 1 pixel
300 MHz
BRAM_18K 18
DSP48E 0
FF 1911
LUT 1392
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
Set
The Set function sets the each pixel in input image to a given scalar value and stores the result in
dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Set function in Resource optimized
(8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool for the
Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 87 87
LUT 43 42
CLB 17 18
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Sobel Filter
The Sobel function Computes the gradients of input image in both x and y direction by
convolving the kernel with input image being processed.
G y = ⎢ 0 0 0 ⎥*I
⎣1 2 1⎦
• For Kernel size 5x5
GradientX:
⎡ -1 1⎤
○
-2 0 2
⎢ -4 -8 0 8 4⎥
⎢ ⎥
G x = ⎢ -6 -12 0 12 6⎥*I
⎢ -4 -8 0 8 4⎥
⎣ -1 -2 0 2 1⎦
GradientY:
⎡ -1 -1⎤
○
-4 -6 -4
⎢ -2 -8 -12 -8 -2⎥
⎢ ⎥
Gy = ⎢ 0 0 0 0 0 ⎥*I
⎢2 8 12 8 2⎥
⎣1 4 6 4 1⎦
• For Kernel size 7x7
○ GradientX:
⎡ -1 -4 -5 0 5 4 1⎤
⎢ -6 -24 -30 0 30 24 6⎥
⎢-15 -60 75 0 75 60 15⎥
⎢ ⎥
G x = ⎢-20 -80 -100 0 75 60 15⎥*I
⎢-15 -60 -75 0 75 60 15⎥
⎢ -6 -24 -30 0 30 24 6⎥
⎣ -1 -4 -5 0 5 4 1⎦
○ GradientY:
⎡-1 -6 -15 -20 -15 -6 -1⎤
⎢-4 -24 -60 -80 -60 -24 -4⎥
⎢-5 -30 -75 -100 -75 -30 -5⎥
⎢ ⎥
Gy = ⎢ 0 0 0 0 0 0 0 ⎥*I
⎢5 30 75 100 75 30 5⎥
⎢4 24 60 80 60 24 4⎥
⎣1 6 15 20 15 6 1⎦
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
FILTER_TYPE Filter size. Filter size of 3 (XF_FILTER_3X3), 5 (XF_FILTER_5X5) and 7 (XF_FILTER_7X7) are
supported.
BORDER_TYPE Border Type supported is XF_BORDER_CONSTANT
SRC_T Input pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and XF_8UC3)
DST_T Output pixel type. Only 8-bit unsigned, 16-bit signed,1 and 3 channels are supported (XF_8UC1,
XF_16SC1,XF_8UC3 and XF_16SC3)
ROWS Maximum height of input and output image.
COLS Maximum width of input and output image. Must be multiple of 8, for 8-pixel operation.
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
USE_URAM Enable to map storage structures to UltraRAM
_src_mat Input image
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a grayscale HD (1080x1920) image.
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K DSP_48Es FF LUT CLB
(MHz)
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to
process a 4K 3 Channel image.
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K DSP_48Es FF LUT
(MHz)
The following table summarizes the resource utilization of the kernel in different configurations,
generated using SDx 2019.1 tool for the Xilinx xczu7ev-ffvc1156-2-e FPGA, to process a
grayscale 4K (3840x2160) image with UltraRAM enable.
Table 400: Sobel Function Resource Utilization Summary with UltraRAM enable
Utilization Estimate
Operating
Operating Frequency
Filter Size
Mode BRAM_18K URAM DSP_48Es FF LUT
(MHz)
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a
grayscale HD (1080x1920) image.
For the semi-global method in xfOpenCV, census transform in conjunction with Hamming
distance is used for cost computation. The semiglobal optimization block is based on the
implementation by Hirschmuller, but approximates the cost aggregation by considering only four
directions.
Parallelism is achieved by computing and aggregating cost for multiple disparities in parallel, and
this parameter is included as a compile-time input.
API Syntax
template<int BORDER_TYPE, int WINDOW_SIZE, int NDISP, int PU, int R, int
SRC_T, int DST_T, int ROWS, int COLS, int NPC>
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
BORDER_TYPE The border pixels are processed in Census transform function based on this parameter. Only
XF_BORDER_CONSTANT is supported.
WINDOW_SIZE Size of the window used for Census transform computation. Only ‘5’ (5x5) is supported.
NDISP Number of disparities
PU Number of disparity units to be computed in parallel
R Number of directions for cost aggregation. It must be 2, 3, or 4.
SRC_T Type of input image Mat object. It must be XF_8UC1.
DST_T Type of output disparity image Mat object. It must be XF_8UC1.
ROWS Maximum height of the input image.
COLS Maximum width of the input image.
NPC Number of pixels to be computed in parallel. It must be XF_NPPC1.
_src_mat_l Left input image Mat
_src_mat_r Right input image Mat
_dst_mat Output disparity image Mat
p1 Small penalty for cost aggregation
p2 Large penalty for cost aggregation. The maximum value is 100.
Resource Utilization
The following table summarizes the resource utilization for a 1920 x 1080 image, with 64
number of disparities, and 32 parallel units.
Performance Estimate
Local block matching algorithm consists of pre-processing and disparity estimation stages. The
pre-processing consists of Sobel gradient computation followed by image clipping. And the
disparity estimation consists of SAD (Sum of Absolute Difference) computation and obtaining the
disparity using winner takes all method (least SAD will be the disparity). Invalidity of the pixel
relies upon its uniqueness from the other possible disparities. And the invalid pixels are indicated
with the disparity value of zero.
API Syntax
template <int WSIZE, int NDISP, int NDISP_UNIT, int SRC_T, int DST_T, int
ROWS, int COLS, int NPC = XF_NPPC1,bool USE_URAM=false>
void StereoBM(xf::Mat<SRC_T, ROWS, COLS, NPC> &_left_mat, xf::Mat<SRC_T,
ROWS, COLS, NPC> &_right_mat, xf::Mat<DST_T, ROWS, COLS, NPC> &_disp_mat,
xf::xFSBMState<WSIZE,NDISP,NDISP_UNIT> &sbmstate);
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Parameter Description
DST_T Output type. This is XF_16UC1, where the disparities are arranged in Q12.4 format.
ROWS Maximum height of input and output image
COLS Maximum width of input and output image
NPC Number of pixels to be processed per cycle; possible options are XF_NPPC1 only.
USE_URAM Enable to map some storage structures to UltraRAM
left_image Image from the left camera
right_image Image from the right camera
disparity_image Disparities output in the form of an image.
sbmstate Class object consisting of various parameters regarding the stereo block matching algorithm.
1. preFilterCap: Default value is 31, can be altered by the user, value ranges from 1 to 63
2. minDisparity: Default value is 0, can be altered by the user, value ranges from 0 to (imgWidth-
NDISP)
3. uniquenessRatio: Default set to 15, but can be altered to any non-negative integer.
4. textureThreshold: Default set to 10, but can be modified to any non-negative integer.
Resource Utilization
The following table summarizes the resource utilization of the kernel in different configurations,
generated using Vivado HLS 2019.1 version tool for the Xilinx® Xczu9eg-ffvb1156-1-i-es1
FPGA, to progress a grayscale HD (1080x1920) image.
Resource Utilization
Frequency
Configurations
(MHz) BRAM_18k DSP48E FF LUT
The following table summarizes the resource utilization of the kernel in different configurations,
generated using SDx 2019.1 version tool for the Xilinx xczu7ev-ffvc1156-2-e FPGA, to progress
a grayscale HD (1080x1920) image with UltraRAM enable.
Table 407: StereoBM Function Resource Utilization Summary with UltraRAM Enable
Resource Utilization
Frequency
Configurations
(MHz) BRAM_18k URAM DSP48E FF LUT
Performance Estimate
The following table summarizes a performance estimate of the Stereo local block matching in
different configurations, as generated using Vivado HLS 2019.1 tool for Xilinx Xczu9eg-
ffvb1156-1-i-es1 FPGA, to process a grayscale HD (1080x1920) image.
Latency (ms)
Frequency
Configurations
(MHz) Min Max
SubRS
The SubRS function subtracts the intensity of the source image from a scalar image and stores it
in the destination image.
API Syntax
template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1>
void subRS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char
_scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the SubRS function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 103 104
LUT 44 133
CLB 23 43
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
SubS
The SubS function subtracts a scalar value from the intensity of source image and stores it in the
destination image.
API Syntax
template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1>
void subS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char
_scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the SubS function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 103 104
LUT 44 133
CLB 23 43
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Sum
The sum function calculates the sum of all pixels in input image.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Sum function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 341 408
LUT 304 338
CLB 71 87
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
1 pixel 300
8 pixel 150
SVM
The SVM function is the SVM core operation, which performs dot product between the input
arrays. The function returns the resultant dot product value with its fixed point type.
API Syntax
template<int SRC1_T, int SRC2_T, int DST_T, int ROWS1, int COLS1, int
ROWS2, int COLS2, int NPC=1, int N>
void SVM(xf::Mat<SRC1_T, ROWS1, COLS1, NPC> &in_1, xf::Mat<SRC2_T, ROWS2,
COLS2, NPC> &in_2, uint16_t idx1, uint16_t idx2, uchar_t frac1, uchar_t
frac2, uint16_t n, uchar_t *out_frac, ap_int<XF_PIXELDEPTH(DST_T)> *result)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameters Description
Parameters Description
Resource Utilization
The following table summarizes the resource utilization of the SVM function, generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
300 0 1 27 34 12
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency (MHz)
Min (cycles) Max (cycles)
Thresholding
The Threshold function performs thresholding operation on the input image. There are several
types of thresholding supported by the function.
⎧maxval, i f src(x, y) > threshold
dst(x, y) = ⎨
⎩ 0, Otherwise
⎧ 0, i f src(x, y) > threshold
dst(x, y) = ⎨
⎩ maxval, Otherwise
⎧threshold, i f src(x, y) > threshold
dst(x, y) = ⎨
⎩ src⎛⎝x, y⎞⎠, Otherwise
⎧src⎛⎝x, y⎞⎠, i f src(x, y) > threshold
dst(x, y) = ⎨
⎩ 0, Otherwise
⎧0, i f src(x, y) > threshold
dst(x, y) = ⎨
⎩ src⎛⎝x, y⎞⎠, Otherwise
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the kernel with binary thresholding in
different configurations, generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-
ffvb1156-1 FPGA, to process a grayscale HD (1080x1920) image.
Resource Utilization
Configurations 1 pixel 8 pixel
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 110 154
LUT 61 139
CLB 16 37
Performance Estimate
The following table summarizes the performance of the kernel in different configurations, as
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1, to process a
grayscale HD (1080x1920) image.
Atan2
The Atan2LookupFP function finds the arctangent of y/x. It returns the angle made by the
⎡x⎤
⎣y⎦
vector with respect to origin. The angle returned by atan2 will also contain the quadrant
information.
Atan2LookupFP is a fixed point version of the standard atan2 function. This function
implements the atan2 using a lookup table approach. The values in the look up table are
represented in Q4.12 format and so the values returned by this function are in Q4.12. A
maximum error of 0.2 degrees is present in the range of 89 to 90 degrees when compared to the
standard atan2 function available in glibc. For the other angles (0 to 89) the maximum error is in
the order of 10-3. This function returns 0 when both xs and ys are zeroes.
API Syntax
short Atan2LookupFP(short xs, short ys, int M1,int N1,int M2, int N2)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Atan2LookupFP function ,
generated using Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Utilization Estimate
Operating
Frequency
BRAM_18K DSP_48Es FF LUT CLB
(MHz)
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency
300 1 15
Inverse (Reciprocal)
The Inverse function computes the reciprocal of a number x. The values of 1/x are stored in a
look up table of 2048 size. The index for picking the 1/x value is computed using the fixed point
format of x. Once this index is computed, the corresponding 1/x value is fetched from the look
up table and returned along with the number of fractional bits needed to represent this value in
fixed point format.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Inverse function, generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
300 4 0 68 128 22
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency
300 1 8
Look Up Table
The LUT function performs the table lookup operation. Transforms the source image into the
destination image using the given look-up table. The input image must be of depth XF_8UP and
the output image of same type as input image.
Where:
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
SRC_T Input and Output pixel type. Only 8-bit, unsigned, 1 and 3 channels are supported (XF_8UC1 and
XF_8UC3)
ROWS Number of rows in the image being processed.
COLS Number of columns in the image being processed. Must be a multiple of 8, for 8-pixel operation.
NPC Number of pixels to be processed in parallel. Possible options are XF_NPPC1 and XF_NPPC8 for 1
pixel and 8 pixel operations respectively.
_src Input image of size (ROWS, COLS) and type 8U.
_dst Output image of size (ROWS, COLS) and same type as input.
_lut Input lookup Table of size 256 and type unsigned char.
Resource Utilization
The following table summarizes the resource utilization of the LUT function, generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
The following table summarizes the resource utilization of the LUT function, generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process 4K 3Channel
image.
Utilization Estimate
Operating Operating Frequency
Mode (MHz) BRAM_18K DSP_48Es FF LUT CLB
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1, to process a grayscale HD
(1080x1920) image.
Latency Estimate
Operating Mode
Max Latency
Square Root
The Sqrt function computes the square root of a 16-bit fixed point number using the non-
restoring square root algorithm. The non-restoring square root algorithm uses the two's
complement representation for the square root result. At each iteration the algorithm can
generate exact result value even in the last bit.
Input argument D must be 16-bit number, though it is declared as 32-bit. The output sqrt(D) is
16-bit type. If format of D is QM.N (where M+N = 16) then format of output is Q(M/2).N
To get a precision of 'n' bits in fractional part, you can simply left shift the radicand (D) by '2n'
before the function call and shift the solution right by 'n' to get the correct answer. For example,
to find the square root of 35 (011000112) with one bit after the decimal point, that is, N=1:
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Sqrt function, generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Utilization Estimate
Operating
Frequency
BRAM_18K DSP_48Es FF LUT CLB
(MHz)
300 0 0 8 6 1
Performance Estimate
The following table summarizes the performance in different configurations, as generated using
Vivado HLS 2019.1 tool for the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA.
Latency Estimate
Operating Frequency
300 18 18
WarpTransform
The warpTransform function is designed to perform the perspective and affine geometric
transformations on an image. The type of transform is a compile time parameter to the function.
The function uses a streaming interface to perform the transformation. Due to this and due to
the fact that geometric transformations need access to many different rows of input data to
compute one output row, the function stores some rows of the input data in block RAMs/
UltraRAMs. The number of rows the function stores can be configured by the user by modifying
a template parameter. Based on the transformation matrix, you can decide on the number of
rows to be stored. You can also choose when to start transforming the input image in terms of
the number of rows of stored image.
Affine Transformation
⎡ M 11 M 12 M 13⎤
M=
⎣ M 21 M 22 M 23⎦
Affine transformation is applied in the warpTransform function following the equation:
⎛ x⎞
⎛x⎞
dst⎝ y ⎠ = M* src⎜ y ⎟
⎝ 1⎠
Perspective Transformation
⎡ M 11 M 12 M 13⎤
⎢ ⎥
M = ⎢ M 21 M 22 M 23⎥
⎣ M 31 M 32 M 33⎦
⎛ x⎞ ⎛ x⎞
dst y ⎟ = M* src⎜ y ⎟
1⎜
⎝ n⎠ ⎝ 1⎠
The destination pixel is then computed by dividing the first two dimensions of the dst1 by the
third dimension
⎛ x⎞ ⎛ x⎞
dst y ⎟ = M* src⎜ y ⎟
1⎜
⎝ n⎠ ⎝ 1⎠
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Warp transform, generated using
Vivado HLS 2019.1 version tool for the Xilinx Number of lines of the image that need to be
buffered locally on FPGA.Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale HD
(1080x1920) image.
Utilization Estimate
Operating
INTERPOLATION STORE START Frequency
Transformation
_TYPE _LINES _ROW LUTs FFs DSPs BRAMs
(MHz)
Number of lines of the image that need to be buffered locallyThe following table summarizes the
resource utilization of the Warp transform, generated using Vivado HLS 2019.1 version tool for
the Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a BGR 4K image.
Utilization Estimate
Operating
INTERPOLATION STORE START Frequency
Transformation
_TYPE _LINES _ROW LUTs FFs DSPs BRAMs
(MHz)
The following table summarizes the resource utilization of the Warp transform, generated using
SDx 2019.1 version tool for the Xilinx xczu7ev-ffvc1156-2-e FPGA, to progress a grayscale 4K
image with UltraRAM enabled.
Utilization Estimate
Operatin
g
INTERPOLATION STORE START Frequenc
Transformation
_TYPE _LINES _ROW y LUTs FFs DSPs BRAMs URAM
(MHz)
Performance Estimate
The following table summarizes a performance estimate of the Warp transform, as generated
using Vivado HLS 2019.1 tool for Xilinx Xczu9eg-ffvb1156-1-i-es1 FPGA, to process a grayscale
HD (1080x1920) image.
Operating
Latency Estimate
STORE START Frequency
Transformation INTERPOLATION _TYPE
_LINES _ROW
Max (ms)
(MHz)
Zero
The Zero function sets the each pixel in input image to zero and stores the result in dst.
API Syntax
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
Resource Utilization
The following table summarizes the resource utilization of the Zero function in Resource
optimized (8 pixel) mode and normal mode as generated using Vivado HLS 2019.1 version tool
for the Xczu9eg-ffvb1156-1-i-es1 FPGA.
Resource Utilization
Name 1 pixel per clock operation 8 pixel per clock operation
300 MHz 150 MHz
BRAM_18K 0 0
DSP48E 0 0
FF 78 78
LUT 42 41
CLB 15 14
Performance Estimate
The following table summarizes a performance estimate of the kernel in different configurations,
generated using Vivado HLS 2019.1 tool for Xczu9eg-ffvb1156-1-i-es1 FPGA to process a
grayscale HD (1080x1920) image.
Latency Estimate
Operating Mode
Operating Frequency (MHz) Latency (ms)
Chapter 6
Specific details of the implementation of the example on the host follow to help understand the
process in which the claimed throughput is achieved.
pyrof_hw()
The pyrof_hw() is the host function that computes the dense optical flow.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
Parameter Description
mat_imagepyr1 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the first image
mat_imagepyr2 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the second image
pyr_h An array of integers which includes the size of number of image pyramid levels, to store the
height of the image at each pyramid level
pyr_w An array of integers which includes the size of the number of image pyramid levels, to store the
width of the image at each pyramid level
Dataflow
1. Set the sizes of the images in various levels of the image pyramid
2. Copy input images from cv::Mat format to the xf::Mat object allocated to contain the largest
image pyramid level
3. Create the image pyramid calling the pyr_dense_optical_flow_pyr_down_accel()
function
4. Use the pyr_dense_optical_flow_accel() function to compute the optical flow
output by iterating over the pyramid levels as input by the user
5. Unpack the flow vectors and convert them to the floating point, and return
The important steps 3 and 4 in the above processes will be explained in detail.
pyr_dense_optical_flow_pyr_down_accel()
API Syntax
void
pyr_dense_optical_flow_pyr_down_accel(xf::Mat<XF_8UC1,HEIGHT,WIDTH,XF_NPPC1>
mat_imagepyr1[NUM_LEVELS], xf::Mat<XF_8UC1,HEIGHT,WIDTH,XF_NPPC1>
mat_imagepyr2[NUM_LEVELS])
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
mat_imagepyr1 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the first image. The memory location corresponding to the highest pyramid level [0]
in this allocated memory must contain the first input image.
mat_imagepyr2 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the second image. The memory location corresponding to the highest pyramid level
[0] in this allocated memory must contain the second input image.
xf::pyrDown<XF_8UC1,HEIGHT,WIDTH,XF_NPPC1,XF_USE_URAM>(mat_imagepyr1[pyr_com
p], mat_imagepyr1[pyr_comp+1]);
#pragma SDS async(2)
#pragma SDS resource(2)
xf::pyrDown<XF_8UC1,HEIGHT,WIDTH,XF_NPPC1,XF_USE_URAM>(mat_imagepyr2[pyr_com
p], mat_imagepyr2[pyr_comp+1]);
#pragma SDS wait(1)
#pragma SDS wait(2)
}
The code is straightforward without the pragmas, and the xf::pyrDown function is being called
twice every iteration. First with the first image and then with the second image. Note that the
input to the next iteration is the output of the current iteration. The pragma #pragma SDS
async(ID) makes the Arm® processor call the hardware function and not wait for the hardware
function to return. The Arm processor takes some cycles to call the function, which includes
programming the DMA. The pragma #pragma SDS wait(ID) makes the Arm processor wait for the
hardware function called with the async(ID) pragma to finish processing. The pragma #pragma
SDS resource(ID) creates a separate hardware instance each time the hardware function is called
with a different ID. With this new information it is easy to assimilate that the loop in the above
host function calls the two hardware instances of xf::pyrDown functions in parallel, waits until
both the functions return and proceed to the next iteration.
//compute the flow vectors for the current pyramid level iteratively
for(int iterations=0;iterations<NUM_ITERATIONS; iterations++)
{
bool scale_up_flag = (iterations==0)&&(l != NUM_LEVELS-1);
int next_height = (scale_up_flag==1)?pyr_h[l+1]:pyr_h[l];
int next_width = (scale_up_flag==1)?pyr_w[l+1]:pyr_w[l];
float scale_in = (next_height - 1)*1.0/(curr_height - 1);
ap_uint<1> init_flag = ((iterations==0) && (l==NUM_LEVELS-1))?
1 : 0;
if(flag_flowin)
{
flow.rows = pyr_h[l];
flow.cols = pyr_w[l];
flow.size = pyr_h[l]*pyr_w[l];
pyr_dense_optical_flow_accel(mat_imagepyr1[l],
mat_imagepyr2[l], flow_iter, flow, l, scale_up_flag, scale_in, init_flag);
flag_flowin = 0;
}
else
{
flow_iter.rows = pyr_h[l];
flow_iter.cols = pyr_w[l];
flow_iter.size = pyr_h[l]*pyr_w[l];
pyr_dense_optical_flow_accel(mat_imagepyr1[l],
mat_imagepyr2[l], flow, flow_iter, l, scale_up_flag, scale_in, init_flag);
flag_flowin = 1;
}
}//end iterative coptical flow computation
} // end pyramidal iterative optical flow HLS computation
The Iterative Pyramidal Dense Optical Flow is computed in a nested for loop which runs for
iterations*pyramid levels number of iterations. The main loop starts from the smallest image size
and iterates up to the largest image size. Before the loop iterates in one pyramid level, it sets the
current pyramid level’s height and width, in curr_height and current_width variables. In the
nested loop, the next_height variable is set to the previous image height if scaling up is necessary,
that is, in the first iterations. As divisions are costly and one time divisions can be avoided in
hardware, the scale factor is computed in the host and passed as an argument to the hardware
kernel. After each pyramid level, in the first iteration, the scale-up flag is set to let the hardware
function know that the input flow vectors need to be scaled up to the next higher image size.
Scaling up is done using bilinear interpolation in the hardware kernel.
After all the input data is prepared, and the flags are set, the host processor calls the hardware
function. Please note that the host function swaps the flow vector inputs and outputs to the
hardware function to iteratively solve the optimization problem. Also note that the
pyr_dense_optical_flow_accel() function is just a wrapper to the hardware function
xf::densePyrOpticalFlow. Template parameters to the hardware function are passed inside
this wrapper function.
Corner tracking example uses five hardware functions from the xfOpenCV library
xf::cornerHarris, xf:: cornersImgToList, xf::cornerUpdate, xf::pyrDown, and
xf::densePyrOpticalFlow.
Update location of
corners
Tracked corners
A new hardware function, xf::cornerUpdate, has been added to ensure that the dense flow
vectors from the output of thexf::densePyrOpticalFlow function are sparsely picked and
stored in a new memory location as a sparse array. This was done to ensure that the next
function in the pipeline would not have to surf through the memory by random accesses. The
function takes corners from Harris corner detector and dense optical flow vectors from the
dense pyramidal optical flow function and outputs the updated corner locations, tracking the
input corners using the dense flow vectors, thereby imitating the sparse optical flow behavior.
This hardware function runs at 300 MHz for 10,000 corners on a 720p image, adding very
minimal latency to the pipeline.
cornerUpdate()
API Syntax
template <unsigned int MAXCORNERSNO, unsigned int TYPE, unsigned int ROWS,
unsigned int COLS, unsigned int NPC>
void cornerUpdate(ap_uint<64> *list_fix, unsigned int *list, uint32_t
nCorners, xf::Mat<TYPE,ROWS,COLS,NPC> &flow_vectors, ap_uint<1> harris_flag)
Parameter Descriptions
The following table describes the template and the function parameters.
Parameter Description
The example codeworks on an input video which is read and processed using the xfOpenCV
library. The core processing and tracking is done by the xf_corner_tracker_accel()
function at the host.
cornersImgToList()
API Syntax
template <unsigned int MAXCORNERSNO, unsigned int TYPE, unsigned int ROWS,
unsigned int COLS, unsigned int NPC>
void cornersImgToList(xf::Mat<TYPE,ROWS,COLS,NPC> &_src, unsigned int
list[MAXCORNERSNO], unsigned int *ncorners)
Parameter Descriptions
The following table describes the template and theKintex® UltraScale+™ function parameters.
Parameter Description
_src The output image of harris corner detector. The size of this xf::Mat object is the size of the input
image to Harris corner detector. The value of each pixel is 255 if a corner is present in the location, 0
otherwise.
list A 32 bit memory allocated, the size of MAXCORNERS, to store the corners detected by Harris Detector
Parameter Description
ncorners Total number of corners detected by Harris, that is, the number of corners in the list
cornerTracker()
The xf_corner_tracker_accel() function does the core procesing and tracking at the
host.
API Syntax
Parameter Descriptions
The table below describes the template and the function parameters.
Parameter Description
flow Allocated xf::Mat to temporarily store the packed flow vectors during the iterative computation
using the hardware function
flow_iter Allocated xf::Mat to temporarily store the packed flow vectors during the iterative computation
using the hardware function
mat_imagepyr1 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the first image
mat_imagepyr2 An array, of size equal to the number of image pyramid levels, of xf::Mat to store the image
pyramid of the second image
inHarris Input image to Harris Corner Detector in xf::Mat
outHarris Output image from Harris detector. Image has 255 if a corner is present in the location and 0
otherwise
list A 32 bit memory allocated, the size of MAXCORNERS, to store the corners detected by Harris
Detector
listfixed A 64 bit memory allocated, the size of MAXCORNERS, to store the corners tracked by
xf::cornerUpdate
pyr_h An array of integers the size of number of image pyramid levels to store the height of the image
at each pyramid level
pyr_w An array of integers the size of number of image pyramid levels to store the width of the image
at each pyramid level
num_corners An array, of size equal to the number ofNumber of corners detected by Harris Corner Detector
harrisThresh Threshold input to the Harris Corner Detector, xf::harris
harris_flag Flag used by the caller of this function to use the corners detected by xf::harris for the set of
input images
Image Processing
The following steps demonstrate the Image Processing procedure in the hardware pipeline
if(*harris_flag == true)
{
#pragma SDS async(1)
xf::cornerHarris<FILTER_WIDTH,BLOCK_WIDTH,NMS_RADIUS,XF_8UC1,HEIGHT,WIDTH,XF
_NPPC1,XF_USE_URAM>(inHarris, outHarris, Thresh, k);
#pragma SDS async(2)
xf::cornersImgToList<MAXCORNERS,XF_8UC1,HEIGHT,WIDTH,XF_NPPC1>(outHarris,
list, &nCorners);
}
//Code to compute Iterative Pyramidal Dense Optical Flow
if(*harris_flag == true)
{
#pragma SDS wait(1)
#pragma SDS wait(2)
*num_corners = nCorners;
}
if(flag_flowin)
{
xf::cornerUpdate<MAXCORNERS,XF_32UC1,HEIGHT,WIDTH,XF_NPPC1>(listfixed,
list, *num_corners, flow_iter, (ap_uint<1>)(*harris_flag));
}
else
xf::cornerUpdate<MAXCORNERS,XF_32UC1,HEIGHT,WIDTH,XF_NPPC1>(listfixed,
list, *num_corners, flow, (ap_uint<1>)(*harris_flag));
}
if(*harris_flag == true)
{
*harris_flag = false;
}
The xf_corner_tracker_accel() function takes a flag called harris_flag which is set during
the first frame or when the corners need to be redetected. The xf::cornerUpdate function
outputs the updated corners to the same memory location as the output corners list of
xf::cornerImgToList. This means that when harris_flag is unset, the corners input to the
xf::cornerUpdate are the corners tracked in the previous cycle, that is, the corners in the
first frame of the current input frames.
After the Dense Optical Flow is computed, if harris_flag is set, the number of corners that
xf::cornerharris has detected and xf::cornersImgToList has updated is copied to
num_corners variable which is one of the outputs of the xf_corner_tracker_accel()
function. The other being the tracked corners list, listfixed. If harris_flag is set,
xf::cornerUpdate tracks the corners in ‘list’ memory location, otherwise it tracks the corners
in ‘listfixed’ memory location.
Color Detection
The Color Detection algorithm is basically used for color object tracking and object detection,
based on the color of the object. The color based methods are very useful for object detection
and segmentation, when the object and the background have a significant difference in color.
The Color Detection example uses four hardware functions from the xfOpenCV library. They are:
• xf::RGB2HSV
• xf::colorthresholding
• xf:: erode
• xf:: dilate
In the Color Detection example, the color space of the original BGR image is converted into an
HSV color space. Because HSV color space is the most suitable color space for color based image
segmentation. Later, based on the H (hue), S (saturation) and V (value) values, apply the
thresholding operation on the HSV image and return either 255 or 0. After thresholding the
image, apply erode (morphological opening) and dilate (morphological opening) functions to
reduce unnecessary white patches (noise) in the image. Here, the example uses two hardware
instances of erode and dilate functions. The erode followed by dilate and once again applying
dilate followed by erode.
In the given example, the source image is passed to the xf::RGB2HSV function, the output of
that function is passed to the xf::colorthresholding module, the thresholded image is
passed to the xf::erode function and, the xf::dilate functions and the final output image
are returned.
• xf::GaussianBlur
• xf::duplicateMat
• xf::delayMat
• xf::subtract
The Difference of Gaussian Filter function can be implemented by applying Gaussian Filter on
the original source image, and that Gaussian blurred image is duplicated as two images. The
Gaussian blur function is applied to one of the duplicated images, whereas the other one is
stored as it is. Later, perform the Subtraction function on, two times Gaussian applied image and
one of the duplicated image. Here, the duplicated image has to wait until the Gaussian applied
for other one generates at least for one pixel output. Therefore, here xf::delayMat function is
used to add delay.
In the given example, the Gaussain Blur function is applied for source image imginput, and
resultant image imgin1 is passed to xf::duplicateMat. The imgin2 and imgin3 are the
duplicate images of Gaussian applied image. Again gaussian blur is applied to imgin2 and the
result is stored in imgin4. Now, perform the subtraction between imgin4 and imgin3, but
here imgin3 has to wait up to at least one pixel of imgin4 generation. So, delay has applied for
imgin3 and stored in imgin5. Finally the subtraction performed on imgin4 and imgin5.
The two main components involved in the pipeline are stereo rectification and disparity
estimation using local block matching method. While disparity estimation using local block
matching is a discrete component in xfOpenCV, rectification block can be constructed using
xf::InitUndistortRectifyMapInverse() and xf::Remap(). The dataflow pipeline is
shown below. The camera parameters are an additional input to the pipeline.
*distC_r_fix,
ap_fixed<32,12> *irA_l_fix, ap_fixed<32,12> *irA_r_fix, int _cm_size,
int _dc_size)
{
xf::InitUndistortRectifyMapInverse<XF_CAMERA_MATRIX_SIZE,XF_DIST_COEFF_SIZE,
XF_32FC1,XF_HEIGHT,XF_WIDTH,XF_NPPC1>(cameraMA_l_fix,distC_l_fix,irA_l_fix,m
apxLMat,mapyLMat,_cm_size,_dc_size);
xf::remap<XF_REMAP_BUFSIZE,XF_INTERPOLATION_BILINEAR,XF_8UC1,XF_32FC1,XF_8UC
1,XF_HEIGHT,XF_WIDTH,XF_NPPC1,XF_USE_URAM>(leftMat,leftRemappedMat,mapxLMat,
mapyLMat);
xf::InitUndistortRectifyMapInverse<XF_CAMERA_MATRIX_SIZE,XF_DIST_COEFF_SIZE,
XF_32FC1,XF_HEIGHT,XF_WIDTH,XF_NPPC1>(cameraMA_r_fix,distC_r_fix,irA_r_fix,m
apxRMat,mapyRMat,_cm_size,_dc_size);
xf::remap<XF_REMAP_BUFSIZE,XF_INTERPOLATION_BILINEAR,XF_8UC1,XF_32FC1,XF_8UC
1,XF_HEIGHT,XF_WIDTH,XF_NPPC1,XF_USE_URAM>(rightMat,rightRemappedMat,mapxRMa
t,mapyRMat);
xf::StereoBM<SAD_WINDOW_SIZE,NO_OF_DISPARITIES,PARALLEL_UNITS,XF_8UC1,XF_16U
C1,XF_HEIGHT,XF_WIDTH,XF_NPPC1,XF_USE_URAM>(leftRemappedMat,
rightRemappedMat, dispMat, bm_state);
}
Appendix A
Xilinx Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Xilinx
Support.
Xilinx Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Note: For more information on DocNav, see the Documentation Navigator page on the Xilinx website.
References
1. SDSoC Environment Getting Started Tutorial (UG1028)
Copyright
© Copyright 2017-2019 Xilinx, Inc. Xilinx, the Xilinx logo, Alveo, Artix, Kintex, Spartan, Versal,
Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the
United States and other countries. OpenCL and the OpenCL logo are trademarks of Apple Inc.
used by permission by Khronos. HDMI, HDMI logo, and High-Definition Multimedia Interface are
trademarks of HDMI Licensing LLC. AMBA, AMBA Designer, Arm, ARM1176JZ-S, CoreSight,
Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm Limited in the EU and other
countries. All other trademarks are the property of their respective owners.