Vivado HLS
Update
Copyright 2013 Xilinx
.
Vivado High-Level Synthesis:
Accelerated IP Generation and Integration
C based IP Creation
User Preferred System Integration Environment
C,C++
C++or
orSystemC
SystemC
C,
System Generator for DSP
C Libraries
Floating point
math.h
Fixed point
OpenCV
VHDLor
orVerilog
Verilog
VHDL
Vivado IP Integrator
Vivado
IP Catalog
Vivado RTL Integration
Page 2
Copyright 2013 Xilinx
.
Vivado HLS Video Libraries
C Video Libraries
Available within Vivado HLS header files
hls_video.h library
hls_opencv.h library
Enable Migration of OpenCV Designs into Xilinx FPGA
Libraries target real-time Full HD video processing
Libraries support standard AXI4 Interfaces for easy system integration
Page 3
Copyright 2013 Xilinx
.
Video Library: 12 New Functions
AXI4-Stream IO Functions
Video Data Modeling
Linebuffer class
Window class
AXIvideo2Mat
Mat2AXIvideo
OpenCV Interface Functions
cvMat2AXIvideo
IplImage2AXIvideo
AXIvideo2cvMat
AXIvideo2IplImage
cvMat2hlsMat
IplImage2hlsMat
hlsMat2cvMat
hlsMat2IplImage
CvMat2AXIvideo
AXIvideo2CvMat
CvMat2hlsMat
hlsMat2CvMat
Video Functions
AbsDiff
AddS
AddWeighted
And
Avg
AvgSdv
Cmp
CmpS
CornerHarris
CvtColor
Dilate
Page 4
Duplicate
EqualizeHist
Erode
FASTX
Filter2D
MaxS
Mean
Merge
Min
MinMaxLoc
GaussianBlur
MinS
Harris
Mul
HoughLines2
Not
Integral
PaintMask
InitUndistortRectifyMap Range
Max
Reduce
Copyright 2013 Xilinx
.
Remap
Resize
Scale
Set
Sobel
Split
SubRS
SubS
Sum
Threshold
Zero
C Test Bench: Interface Library
Interface Libraries convert to/from OpenCV image to HLS type
HLS MAT format: synthesizable and AXI4 Stream support
Standard OpenCV
files, formats & types
#include
#include "hls_opencv.h"
"hls_opencv.h"
//Top
//Top Level
Level CC Function
Function
int
main
(int
argc,
int main (int argc, char**
char** argv)
argv) {{
IplImage*
IplImage*
IplImage*
IplImage*
src
src
dst
dst
==
==
HLS Video Libraries
cvLoadImage(INPUT_IMAGE);
cvLoadImage(INPUT_IMAGE);
cvCreateImage(cvGetSize(src),
cvCreateImage(cvGetSize(src), src->depth,
src->depth, src->nChannels);
src->nChannels);
Convert to Xilinx AXI4
Video Stream
AXI_STREAM
AXI_STREAM src_axi,
src_axi, dst_axi;
dst_axi;
IplImage2AXIvideo(src,
IplImage2AXIvideo(src, src_axi);
src_axi);
image_filter(src_axi,
image_filter(src_axi, dst_axi,
dst_axi, src->height,
src->height, src->width);
src->width);
AXIvideo2IplImage(dst_axi,
AXIvideo2IplImage(dst_axi, dst);
dst);
Convert Xilinx AXI4
Video Stream back to
OpenCV types
cvSaveImage(OUTPUT_IMAGE,
cvSaveImage(OUTPUT_IMAGE, dst);
dst);
Page 5
Function to Synthesize
Copyright 2013 Xilinx
.
C Function to Synthesize
HLS Video Library Functions
Drop-in Replacement for OpenCV and provide High QoR
#include
#include "hls_video.h"
"hls_video.h"
HLS Video & AXI Struct Libraries
#include
"ap_axi_sdata.h";
#include "ap_axi_sdata.h";
//Top
//Top Level
Level CC Function
Function for
for Synthesis
Synthesis
void
void image_filter(AXI_STREAM&
image_filter(AXI_STREAM& inter_pix,
inter_pix, AXI_STREAM&
AXI_STREAM& out_pix,
out_pix, int
int rows,
rows, int
int cols)
cols) {{
//Create
AXI
streaming
interfaces
for
the
core
//Create AXI streaming interfaces for the core
RGB_IMAGE
RGB_IMAGE img_0(rows,
img_0(rows, cols);
cols);
..etc..
..etc..
RGB_IMAGE
RGB_IMAGE img_5(rows,
img_5(rows, cols);
cols);
RGB_PIXEL
pix(50,
50,
50);
RGB_PIXEL pix(50, 50, 50);
#pragma
#pragma HLS
HLS dataflow
dataflow
hls::AXIvideo2Mat(inter_pix,
hls::AXIvideo2Mat(inter_pix, img_0);
img_0);
Convert Xilinx AXI4 Video Stream to
HLS Mat data type
hls::Sobel(img_0,
hls::Sobel(img_0, img_1,
img_1, 1,
1, 0);
0);
hls::SubS(img_1,
pix,
img_2);
hls::SubS(img_1, pix, img_2);
hls::Scale(img_2,
hls::Scale(img_2, img_3,
img_3, 2,
2, 0);
0);
hls::Erode(img_3,
img_4);
hls::Erode(img_3, img_4);
hls::Dilate(img_4,
hls::Dilate(img_4, img_5);
img_5);
HLS Video functions are drop-in
replacement for OpenCV function &
provide high QoR
hls::Mat2AXIvideo(img_5,
hls::Mat2AXIvideo(img_5, out_pix);
out_pix);
Convert HLS Mat type to Xilinx AXI4
Video Stream
}}
Page 6
Copyright 2013 Xilinx
.
Application Note XAPP1167
Accelerating OpenCV Applications with Zynq using
Vivado HLS Video Libraries
Video Processing data types
Compares Video Architectures
Advantages of Video Streaming
Review Video Interfaces
Reference Design with source files
and project directories
Download XAPP1167 from Xilinx.com
QuickTake: Leveraging OpenCV and High-Level
Synthesis with Vivado
Page 7
Copyright 2013 Xilinx
.
Accelerator AXI Interconnect
Zynq PS
IP Control from ARM
AXI4-Lite & GP Port
HLS Accelerator
GP Port
High Throughput Access to
Memory
AXI4 Lite
Zynq PS
HP Port
AXI4-Stream using AXI-DMA
AXI4-Master
ACP Port
The Accelerator is the master
HLS Accelerator
AXI
DMA
AXI4 Stream
Zynq PS
External Memory Access : HP
L2 Cache Access: ACP
HP Port
ACP Port
Data transfer between HLS
IP blocks
AXI4-Stream
.
Copyright 2013 Xilinx
.
HLS Accelerator
AXI4 Master
IP Integrator Supported
IP Integrator Requires an Early
Access License in 2013.1
Vivado HLS IP can be exported to IP Integrator
Export to the Vivado IP Catalog (was previously called IP-XACT format)
Data types supported: IPI can propagate
Add to IP Catalog
Vivado HLS IP
Vivado IP Integrator (IPI)
Export to Vivado
IP Catalog
Add IP block
& connect up
Supported with Two New Tutorials
Page 9
Copyright 2013 Xilinx
.
HLS IP Integration
IP Integrator (IPI) Public Release 2013.2
HLS Output Fully Supported in IPI
Three Tutorials on using HLS IP inside IPI
Two connect HLS IP to the Zynq PS; One connects HLS IP with Xilinx IP
HLS IP Blocks are identified in IPI
HLS and System
Generator IP shown
inside IPI
Page 10
Copyright 2013 Xilinx
.
Improved Software Driver Support
Software Drivers are Created for AXI4-Lite interfaces
Now includes support for Linux Systems
Drivers are also now created for Vivado IP Catalog format
Add all files to the software
project: ifdef statements ensure
automatic configuration
Files are in
the Drivers
sub-directory
Page 11
Copyright 2013 Xilinx
.
Enhanced Report File
Easier to find hot-spots
The term throughput has been changed to Interval or Initiation Interval
All reports and documentation
Top-Level function
Latency and Interval
Latency and Interval for
all instances at this
level of hierarchy
All loops and sub-loops
at this level of hierarchy
Page 12
Copyright 2013 Xilinx
.
Analysis Perspective
A New Perspective for Design Analysis
Allows Interactive Analysis
Module Hierarchy
Hierarchical Summary
and Navigation
Performance View
Scheduled operations.
Loops : shown in Yellow are
expandable and collapsible
Modules: shown in Green
open the view on sub-blocks
Performance Profile
Latency and Interval
summary for this block
Page 13
Copyright 2013 Xilinx
.
Performance View
Hierarchical Navigation
Loop Hierarchy
Operations, loops and
functions
Page 14
Select operations and rightclick to cross reference with
the C source and HDL
Scheduled States
Copyright 2013 Xilinx
.
Resource Analysis
Resource View
Scheduled operations
associated with resource:
anything on the same row
shares the same resource
Resource Profile
Resource summary for this
block
Page 15
Copyright 2013 Xilinx
.
Analysis Perspective Tutorials
Fully Supported by Two New Tutorials
Design Analysis
Design Optimization
Page 16
Copyright 2013 Xilinx
.
Assertion Support
Assertions are supported for Synthesis
Can be used to define bit-widths for synthesis
Replaces the need for a Tripcount directive
Without Assertions
With Assertions
SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
X_accum +=
+= A[i];
A[i];
X[i]
=
X_accum;
X[i] = X_accum;
}}
assert(xlimit<32);
assert(xlimit<32);
SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
+=
A[i];
X_accum += A[i];
X[i]
X[i] == X_accum;
X_accum;
}}
assert(ylimit<16);
assert(ylimit<16);
SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
+=
B[i];
Y_accum += B[i];
Y[i]
Y[i] == Y_accum;
Y_accum;
}}
SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
Y_accum +=
+= B[i];
B[i];
Y[i]
=
Y_accum;
Y[i] = Y_accum;
}}
** Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 256
256 |no
|no
|-|- SUM_Y
|1
~
256
|no
|
SUM_Y |1 ~ 256 |no
|
+----------+-----------+----------+
+----------+-----------+----------+
Page 17
Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 32
32 |no
|no
|-|- SUM_Y
||
SUM_Y |1
|1 ~~ 16
16 |no
|no
+----------+-----------+----------+
+----------+-----------+----------+
Copyright 2013 Xilinx
.
Index counter
hardware is
accurately
sized
Improved Tutorials
Vivado HLS is now provided with 10 Tutorials
22 Labs which cover all aspects of Vivado HLS
Tutorial
Summary
Design
Introduction
Basic walkthrough of GUI operations (Csim, Synth, RTL
Sim, IP package)
C simulation and using the debugger
Explain design, port and AXI interface synthesis (simple
HLS design to allow analysis of IO)
Review of a floating point and fixed windowing algorithm
FIR
Using the Analysis Perspective to optimize performance
of multi-hierarchy, multi-loop design.
Improving performance using pipelining at loop and
function level and impact of IO.
Verify and view trace files using Vivado Xsim and
Modelsim (incl. Floating Point simulation)
Connecting to an IP core using IPI
DCT
C Validation
Interface Synthesis
Arbitrary Precision
Design Analysis
Design Optimization with Pipelining
RTL Verification
Creating IP for an IP Integrator Design
Creating IP for a Zynq Design
Creating IP for a System Generator
Design
Page 18
Connecting to Zyqn with IPI and integrating driver files
into SDK design (interrupt handling etc).
Packaging a design for Sys Gen and verifying IO in Sys
Gen (connecting interfaces etc.)
Copyright 2013 Xilinx
.
Filter Window
Sorter Design
Hamming Window
Matrix Multiplier
DUC
Windower, FFT IP
Core, Sorter
Accelerator
YUV
Improved AXI4 & SystemC Support
SystemC
AXI4 Master, Streams and Lite protocols now supported
Lite: Use the RESOURCE directive to assign ports (as C/C++)
Stream: Use the RESOUCE directive on sc_fifo_in and sc_fifo_out ports
Master: Use the AXI4M_bus_port class
AXI4M_bus_port<sc_fixed<32, 8> > bus_if;
Difference between SystemC and Vivado AP types fully documented
SystemC design no longer require to be explicitly specified
The add_files -type option retired (and check-box in the GUI C/C++ or SystemC)
AXI4 Master Interface
Now supported on Array ports
Array ports can be synthesized with ap_bus IO protocol
Page 19
Copyright 2013 Xilinx
.
RTL cosimulation of Floating Point Designs
Floating Point Designs
The IEEE operators are now in the RTL simulation model
This requires the Xilinx IEEE library is used when RTL-cosimulation is
performed
Auto Support provided: No Action Required
SystemC RTL
Verilog and VHDL using the Xilinx Vivado (Xsim) simulator
Verilog and VHDL using the Mentor Graphics ModelSim simulator
Verilog and VHDL using the Xilinx Isim simulator.
All other 3rd party HDL simulators
The libraries must be pre-compiled before simulating floating point designs
Open Vivado and refer to : compile_simlib help
Note: this is Vivado, not Vivado HLS
Page 20
Copyright 2013 Xilinx
.
DSP48 Adder Resource
Adders supported for implementation in DSP48
Adders in the C code can be targeted to a AddSub_DSP RESOURCE
Ensures the adder or subtractor is implemented in a DSP48
Resource Specification
Targets the adder or subtractor to a DSP48 Resource
(*
(* USE_DSP48
USE_DSP48 == "YES"
"YES" *)
*)
module
module adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 (a,
(a, b,
b, s);
s);
endmodule
endmodule
module
module adders_add_32ns_32ns_32_1(
adders_add_32ns_32ns_32_1( )
)
adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 U1
U1 ((
.a(
.a( din0
din0 ),),
.b(
.b( din1
din1 ),),
.s(
.s( dout
dout ));
));
endmodule
endmodule
Page 21
Copyright 2013 Xilinx
.
DSP48 Adder Implementation
Adders /Subtractors Targeted to a DSP48
Solution 1
Page 22
Solution 2
Copyright 2013 Xilinx
.
FFT and FIR IP in HLS
The Xilinx FFT and FIR IP are available in Vivado HLS
C simulates with a bit-accurate model
Fully configurable within the C++ source code
Pre-defined C++ structs allow the IP to be configured & accessed
Supported only for C++
Implemented with templates
High-Quality Implementation
Same hardware as implemented by RTL versions of this IP
Functionality fully described in Xilinx Documentation
LogiCORE IP Fast Fourier Transform v9.0 (document PG109)
LogiCORE IP FIR Compiler v7.1 (document PG149)
Page 23
Copyright 2013 Xilinx
.
IP Examples
Examples Included in Vivado HLS Release
Access from the Welcome Screen
Or from C:\Xilinx\Vivado_HLS\2013.3\examples\design
Assuming the standard PC install path
Examples IP Designs
1024-point FFT and Inverse FFT (fixed point)
Single FFT 1024-point (fixed point)
FIR with 2 interleaved channels
3 FIRs connected in series (HB, HB, SRRC)
Updating coefficients using FIR CONFIG channel
SRRC (Square Root Raise Cosine) FIR filter
Page 24
Copyright 2013 Xilinx
.
FFT Function
Using the FFT
#include
#include "hls_fft.h
"hls_fft.h
hls::fft<STATIC_PARAM>
hls::fft<STATIC_PARAM> ((
INPUT_DATA_ARRAY,
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_STATUS,
OUTPUT_STATUS,
INPUT_RUN_TIME_CONFIGURATION);
INPUT_RUN_TIME_CONFIGURATION);
//// Static
Static Parameterization
Parameterization Struct
Struct
//// Input
Input data
data fixed
fixed or
or float
float
//// Output
Output data
data fixed
fixed or
or float
float
//// Output
Status
Output Status
//// Input
Input Run
Run Time
Time Configuration
Configuration
Include the hls_fft.h library in the code
This defines the FFT and supporting structs and types
Allows hls::fft to be instantiated in your code
Use the STATIC_PARAM template parameter to parameterize the FFT
The STATIC_PARAM template parameter defines all static configuration values
The Library provides a pre-defined struct hls::ip_fft::params_t to perform this
Optionally modify the default parameters by creating a new user defined
STATIC_PARAM struct based on the default
Page 25
Copyright 2013 Xilinx
.
FIR Function
Using the FIR
#include
#include "hls_fir.h
"hls_fir.h
//// Create
Create an
an instance
instance of
of the
the FIR
FIR
static
static hls::FIR<STATIC_PARAM>
hls::FIR<STATIC_PARAM> fir1;
fir1;
//// Static
Static parameterization
parameterization
//// Execute
Execute the
the FIR
FIR instance
instance fir1
fir1
fir1.run(INPUT_DATA_ARRAY,
//// Input
fir1.run(INPUT_DATA_ARRAY,
Input Data
Data
OUTPUT_DATA_ARRAY);
//
Output
OUTPUT_DATA_ARRAY); // Output Data
Data
Include the hls_fir.h library in the code
This defines the FIR and supporting structs and types
Allows hls::FIR to be instantiated in your code
Unlike the FFT, the FIR is instantiated as a class and executed with the run method
Create the STATIC_PARAM template parameter to configure the FIR
The STATIC_PARAM template parameter defines all static configuration values
The library provides a pre-defined struct hls::ip_fir::params_t to perform this
There are no default values for the Coefficients
You Must Always create a user defined struct based on hls::ip_fir::params_t
Page 26
Copyright 2013 Xilinx
.
Using the FFT and FIR IP
FFT and FIR support pipelined implementations
The functions themselves cannot be pipelined
They should be parameterized for pipelined operation
The data arguments are always arrays
These will be implemented as AXI4 Streams in the RTL
By default, arrays are implemented as BRAM interfaces
Recommendation
Use these IP in regions where dataflow optimization is used
This will auto-convert the input and output arrays into streaming arrays
Alternatively, a Requirement:
The input and output arrays must be marked as streaming using the
command set_directive_stream (pragma STREAM)
Page 27
Copyright 2013 Xilinx
.
Fixed Point Math Functions
Further support for math functions
The hls_math.h library
Now includes fixed-point functions for sin, cos and sqrt
Type
Accuracy (ULP)
Implementation Style
cos
ap_fixed<32,I>
16
Synthesized
sin
ap_fixed<32,I>
16
Synthesized
sqrt
ap_fixed<W,I>
ap_ufixed<W,I>
Synthesized
Function
The sin and cos functions are all 32-bit ap_fixed<32,Int_Bit>
Where Int_Bit specifies the number of integer bits
The sqrt function is any width but must have a decimal point
Cannot be all intergers or all bits
The accuracy above is quoted with respect to the equivalent floating
point version
Page 28
Copyright 2013 Xilinx
.
AXI4 Stream Interface: Ease of Use
Native Support for AXI4 Stream Interfaces
Native = An AXI4 Stream can be specified with set_directive_interface
No longer required to set the interface then add a resource
This AXI4 Stream interface is part of the HDL after synthesis
This AXI4 Stream interface is simulated by RTL co-simulation
Interface Type axis is AXI4 Stream
set_directive_interface mode axis foo portA
Or
#pragma HLS interface axis port=portA
Page 29
Copyright 2013 Xilinx
.
Pre-2013.3 Approach to AXI Streams
#if 1
// Use New Method
#pragma HLS interface axis port=portA
Existing Functionality Deprecated
BUT NOT REMOVED!!
We dont want to break existing designs
#else
// Or use old Method
#pragma HLS interface ap_fifo port=portA
#pragma HLS resource core=AXI4Stream variable=portA \
metadata="-bus_bundle Agroup
#end
Warning:
If you use the method for adding AXI4 Streams before 2013.3
This is were you set the interface as a FIFO then add an AXI Resource
You will get a FIFO interface in the RTL
And the AXI4 Stream adapter is added during export_design
Recommendation
Change existing AXI4 Stream directives to use the INTERFACE
directive
Page 30
Copyright 2013 Xilinx
.
AXI4 Master Interface: Pipeline Support
Transaction involving an AXI4 Master Interface is now Pipelined
Prior to 2013.3 this interface would not pipeline
Each transfer was an atomic process
The for-loop/memcpy waits until a transfer completes before starting next transfer
This was the limiting factor in the pipeline interval
Improved performance in 2013.3
Accesses to an AXI master interface can now be pipelined
The performance will be much better than before
Further improvements in 2014.1
Existing limitations: Cannot configure the based address, infer bursts, reads
and writes cannot be performed simultaneously (sequential only)
We expect to get more performance in 2014.1
At that time well publish statistics and make more noise about this feature
Page 31
Copyright 2013 Xilinx
.
Enhanced Support for Exporting IP
Sys Gen and AXI Stream Interfaces
Design with AXI Stream interfaces now
be exported to System Generator
The AXI Interfaces will be present and
can be connected
Previously, AXI interfaces were not
supported in Sys Gen
AXI Lite Drivers
Software drivers are now included in
the IP package
When creating a local repository in
SDK simply point to the IP package
No need to manually copy files
Further EoU enhancements coming
Page 32
Copyright 2013 Xilinx
.
New Clang Front-end
Vivado HLS has upgraded its front-end parser
Now using clang instead of gcc
Provides 64-bit support on windows
In addition this enables continued growth of features and functionality
More optimizations possible, messages can reference line and column etc.
Clang Side-effect: Different command options
The new front-end does not support all gcc flags
For example, -fpermissive is now ignored as this is not supported by clang
If an option is not supported but provided, it will be ignored
Clang Options: http://
clang.llvm.org/docs/UsersManual.html#command-line-options
Clang Side-Effect: More strict Syntax Checking
Some existing working designs may fail
Not expected to occur often, but is possible
Example fpermissive workaround : memcpy(dest, src), if src is volatile
pointer, cast it to a constant pointer to pass syntax checking
Page 33
Copyright 2013 Xilinx
.
Design Hubs: Easier Access to Documentation
DocNav Designs Hubs
Improved Ease-of-Use
Find things faster
Open Docs at the exact page
Standard
Introduction
Docs and
Videos
High Level Synthesis
Getting Started Videos
Tutorials
Key Concepts
FAQs
These and the solution center will
be updated in the coming weeks
Others such as Designing with
Video etc will be added
Ideas for topics are welcome
Page 34
App Notes and
Videos all
grouped
Copyright 2013 Xilinx
.
Thank You
Page 35
Copyright 2013 Xilinx
.