Académique Documents
Professionnel Documents
Culture Documents
Update
C,C++
C++or
orSystemC
SystemC
C,
C Libraries
Floating point
math.h
Fixed point
OpenCV
VHDLor
orVerilog
Verilog
VHDL
Vivado IP Integrator
Vivado
IP Catalog
Page 2
C Video Libraries
Available within Vivado HLS header files
hls_video.h library
hls_opencv.h library
Page 3
Window class
AXIvideo2Mat
Mat2AXIvideo
AXIvideo2cvMat
AXIvideo2IplImage
cvMat2hlsMat
IplImage2hlsMat
hlsMat2cvMat
hlsMat2IplImage
CvMat2AXIvideo
AXIvideo2CvMat
CvMat2hlsMat
hlsMat2CvMat
Video Functions
AbsDiff
AddS
AddWeighted
And
Avg
AvgSdv
Cmp
CmpS
CornerHarris
CvtColor
Dilate
Page 4
Duplicate
EqualizeHist
Erode
FASTX
Filter2D
MaxS
Mean
Merge
Min
MinMaxLoc
GaussianBlur
MinS
Harris
Mul
HoughLines2
Not
Integral
PaintMask
InitUndistortRectifyMap Range
Max
Reduce
Remap
Resize
Scale
Set
Sobel
Split
SubRS
SubS
Sum
Threshold
Zero
#include
#include "hls_opencv.h"
"hls_opencv.h"
//Top
//Top Level
Level CC Function
Function
int
main
(int
argc,
int main (int argc, char**
char** argv)
argv) {{
IplImage*
IplImage*
IplImage*
IplImage*
src
src
dst
dst
==
==
cvLoadImage(INPUT_IMAGE);
cvLoadImage(INPUT_IMAGE);
cvCreateImage(cvGetSize(src),
cvCreateImage(cvGetSize(src), src->depth,
src->depth, src->nChannels);
src->nChannels);
AXI_STREAM
AXI_STREAM src_axi,
src_axi, dst_axi;
dst_axi;
IplImage2AXIvideo(src,
IplImage2AXIvideo(src, src_axi);
src_axi);
image_filter(src_axi,
image_filter(src_axi, dst_axi,
dst_axi, src->height,
src->height, src->width);
src->width);
AXIvideo2IplImage(dst_axi,
AXIvideo2IplImage(dst_axi, dst);
dst);
cvSaveImage(OUTPUT_IMAGE,
cvSaveImage(OUTPUT_IMAGE, dst);
dst);
Page 5
Function to Synthesize
C Function to Synthesize
HLS Video Library Functions
Drop-in Replacement for OpenCV and provide High QoR
#include
#include "hls_video.h"
"hls_video.h"
HLS Video & AXI Struct Libraries
#include
"ap_axi_sdata.h";
#include "ap_axi_sdata.h";
//Top
//Top Level
Level CC Function
Function for
for Synthesis
Synthesis
void
void image_filter(AXI_STREAM&
image_filter(AXI_STREAM& inter_pix,
inter_pix, AXI_STREAM&
AXI_STREAM& out_pix,
out_pix, int
int rows,
rows, int
int cols)
cols) {{
//Create
AXI
streaming
interfaces
for
the
core
//Create AXI streaming interfaces for the core
RGB_IMAGE
RGB_IMAGE img_0(rows,
img_0(rows, cols);
cols);
..etc..
..etc..
RGB_IMAGE
RGB_IMAGE img_5(rows,
img_5(rows, cols);
cols);
RGB_PIXEL
pix(50,
50,
50);
RGB_PIXEL pix(50, 50, 50);
#pragma
#pragma HLS
HLS dataflow
dataflow
hls::AXIvideo2Mat(inter_pix,
hls::AXIvideo2Mat(inter_pix, img_0);
img_0);
hls::Sobel(img_0,
hls::Sobel(img_0, img_1,
img_1, 1,
1, 0);
0);
hls::SubS(img_1,
pix,
img_2);
hls::SubS(img_1, pix, img_2);
hls::Scale(img_2,
hls::Scale(img_2, img_3,
img_3, 2,
2, 0);
0);
hls::Erode(img_3,
img_4);
hls::Erode(img_3, img_4);
hls::Dilate(img_4,
hls::Dilate(img_4, img_5);
img_5);
hls::Mat2AXIvideo(img_5,
hls::Mat2AXIvideo(img_5, out_pix);
out_pix);
}}
Page 6
HLS Accelerator
GP Port
AXI4 Lite
Zynq PS
HP Port
ACP Port
HLS Accelerator
AXI
DMA
AXI4 Stream
Zynq PS
HP Port
ACP Port
HLS Accelerator
AXI4 Master
IP Integrator Supported
Add to IP Catalog
Vivado HLS IP
Export to Vivado
IP Catalog
Add IP block
& connect up
HLS IP Integration
IP Integrator (IPI) Public Release 2013.2
HLS Output Fully Supported in IPI
Three Tutorials on using HLS IP inside IPI
Two connect HLS IP to the Zynq PS; One connects HLS IP with Xilinx IP
Page 10
Files are in
the Drivers
sub-directory
Page 11
Top-Level function
Latency and Interval
Latency and Interval for
all instances at this
level of hierarchy
All loops and sub-loops
at this level of hierarchy
Page 12
Analysis Perspective
A New Perspective for Design Analysis
Allows Interactive Analysis
Module Hierarchy
Hierarchical Summary
and Navigation
Performance View
Scheduled operations.
Performance Profile
Latency and Interval
summary for this block
Page 13
Performance View
Hierarchical Navigation
Loop Hierarchy
Page 14
Scheduled States
Resource Analysis
Resource View
Scheduled operations
associated with resource:
anything on the same row
shares the same resource
Resource Profile
Page 15
Page 16
Assertion Support
Assertions are supported for Synthesis
Can be used to define bit-widths for synthesis
Replaces the need for a Tripcount directive
Without Assertions
With Assertions
SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
X_accum +=
+= A[i];
A[i];
X[i]
=
X_accum;
X[i] = X_accum;
}}
assert(xlimit<32);
assert(xlimit<32);
SUM_X:for
SUM_X:for (i=0;i<=xlimit;
(i=0;i<=xlimit; i++)
i++) {{
X_accum
+=
A[i];
X_accum += A[i];
X[i]
X[i] == X_accum;
X_accum;
}}
assert(ylimit<16);
assert(ylimit<16);
SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
+=
B[i];
Y_accum += B[i];
Y[i]
Y[i] == Y_accum;
Y_accum;
}}
SUM_Y:for
SUM_Y:for (i=0;i<=ylimit;
(i=0;i<=ylimit; i++)
i++) {{
Y_accum
Y_accum +=
+= B[i];
B[i];
Y[i]
=
Y_accum;
Y[i] = Y_accum;
}}
** Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 256
256 |no
|no
|-|- SUM_Y
|1
~
256
|no
|
SUM_Y |1 ~ 256 |no
|
+----------+-----------+----------+
+----------+-----------+----------+
Page 17
Loop
Loop Latency:
Latency:
+----------+-----------+----------+
+----------+-----------+----------+
|Target
|Target IIII |Trip
|Trip Count
Count |Pipelined
|Pipelined ||
+----------+-----------+----------+
+----------+-----------+----------+
|-|- SUM_X
||
SUM_X |1
|1 ~~ 32
32 |no
|no
|-|- SUM_Y
||
SUM_Y |1
|1 ~~ 16
16 |no
|no
+----------+-----------+----------+
+----------+-----------+----------+
Index counter
hardware is
accurately
sized
Improved Tutorials
Vivado HLS is now provided with 10 Tutorials
22 Labs which cover all aspects of Vivado HLS
Tutorial
Summary
Design
Introduction
FIR
DCT
C Validation
Interface Synthesis
Arbitrary Precision
Design Analysis
Design Optimization with Pipelining
RTL Verification
Creating IP for an IP Integrator Design
Creating IP for a Zynq Design
Creating IP for a System Generator
Design
Page 18
Filter Window
Sorter Design
Hamming Window
Matrix Multiplier
DUC
Windower, FFT IP
Core, Sorter
Accelerator
YUV
Page 19
SystemC RTL
Verilog and VHDL using the Xilinx Vivado (Xsim) simulator
Verilog and VHDL using the Mentor Graphics ModelSim simulator
Verilog and VHDL using the Xilinx Isim simulator.
Page 20
Resource Specification
Targets the adder or subtractor to a DSP48 Resource
(*
(* USE_DSP48
USE_DSP48 == "YES"
"YES" *)
*)
module
module adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 (a,
(a, b,
b, s);
s);
endmodule
endmodule
module
module adders_add_32ns_32ns_32_1(
adders_add_32ns_32ns_32_1( )
)
adders_add_32ns_32ns_32_1_AddSub_DSP_0
adders_add_32ns_32ns_32_1_AddSub_DSP_0 U1
U1 ((
.a(
.a( din0
din0 ),),
.b(
.b( din1
din1 ),),
.s(
.s( dout
dout ));
));
endmodule
endmodule
Page 21
Page 22
Solution 2
High-Quality Implementation
Same hardware as implemented by RTL versions of this IP
Functionality fully described in Xilinx Documentation
LogiCORE IP Fast Fourier Transform v9.0 (document PG109)
LogiCORE IP FIR Compiler v7.1 (document PG149)
Page 23
IP Examples
Examples Included in Vivado HLS Release
Access from the Welcome Screen
Or from C:\Xilinx\Vivado_HLS\2013.3\examples\design
Assuming the standard PC install path
Examples IP Designs
1024-point FFT and Inverse FFT (fixed point)
Single FFT 1024-point (fixed point)
FIR with 2 interleaved channels
3 FIRs connected in series (HB, HB, SRRC)
Updating coefficients using FIR CONFIG channel
SRRC (Square Root Raise Cosine) FIR filter
Page 24
FFT Function
Using the FFT
#include
#include "hls_fft.h
"hls_fft.h
hls::fft<STATIC_PARAM>
hls::fft<STATIC_PARAM> ((
INPUT_DATA_ARRAY,
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_STATUS,
OUTPUT_STATUS,
INPUT_RUN_TIME_CONFIGURATION);
INPUT_RUN_TIME_CONFIGURATION);
//// Static
Static Parameterization
Parameterization Struct
Struct
//// Input
Input data
data fixed
fixed or
or float
float
//// Output
Output data
data fixed
fixed or
or float
float
//// Output
Status
Output Status
//// Input
Input Run
Run Time
Time Configuration
Configuration
Page 25
FIR Function
Using the FIR
#include
#include "hls_fir.h
"hls_fir.h
//// Create
Create an
an instance
instance of
of the
the FIR
FIR
static
static hls::FIR<STATIC_PARAM>
hls::FIR<STATIC_PARAM> fir1;
fir1;
//// Static
Static parameterization
parameterization
//// Execute
Execute the
the FIR
FIR instance
instance fir1
fir1
fir1.run(INPUT_DATA_ARRAY,
//// Input
fir1.run(INPUT_DATA_ARRAY,
Input Data
Data
OUTPUT_DATA_ARRAY);
//
Output
OUTPUT_DATA_ARRAY); // Output Data
Data
Page 26
Recommendation
Use these IP in regions where dataflow optimization is used
This will auto-convert the input and output arrays into streaming arrays
Alternatively, a Requirement:
The input and output arrays must be marked as streaming using the
command set_directive_stream (pragma STREAM)
Page 27
Type
Accuracy (ULP)
Implementation Style
cos
ap_fixed<32,I>
16
Synthesized
sin
ap_fixed<32,I>
16
Synthesized
sqrt
ap_fixed<W,I>
ap_ufixed<W,I>
Synthesized
Function
The sqrt function is any width but must have a decimal point
Cannot be all intergers or all bits
Page 29
#else
// Or use old Method
#pragma HLS interface ap_fifo port=portA
#pragma HLS resource core=AXI4Stream variable=portA \
metadata="-bus_bundle Agroup
#end
Warning:
If you use the method for adding AXI4 Streams before 2013.3
This is were you set the interface as a FIFO then add an AXI Resource
Recommendation
Change existing AXI4 Stream directives to use the INTERFACE
directive
Page 30
Page 31
Page 32
Page 33
Standard
Introduction
Docs and
Videos
Page 34
Thank You
Page 35