You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced/vitis_accelerator.rst
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
VitisAccelerator Backend
3
3
========================
4
4
5
-
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
5
+
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
6
6
The Vitis accelerator backend has been tested with the following boards:
To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
17
17
18
-
The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
18
+
The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.
19
+
20
+
**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.
19
21
20
22
Options
21
23
=======
22
24
23
25
As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
24
26
25
-
* ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
27
+
* ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
26
28
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
27
29
* ``batchsize``: Number of samples to be processed in a single kernel execution.
28
30
29
-
Additionnaly, the backend propose the following options to customize the implementation:
31
+
Additionaly, the backend proposes the following options to customize the implementation:
30
32
31
33
* ``board``: The target board, must match one entry in ``supported_boards.json``.
32
34
* ``clock_period``: The target clock period in ns.
@@ -80,3 +82,4 @@ The following example is a modified version of `hsl4ml example 7 <https://github
80
82
)
81
83
hls_model.compile()
82
84
hls_model.build()
85
+
y = hls_model.predict_hardware(y) # Limited to batchsize * num_kernel * num_thread for now
An specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
111
+
A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
110
112
111
-
Note that the host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications.
113
+
**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.
114
+
115
+
An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.
112
116
113
117
.. code-block:: python
114
118
115
119
# Suppose that you already have input array X
116
-
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the .xclbin file is successfully created, before using hw_predict
117
-
118
-
y = hls_model.hw_predict(X)
120
+
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
121
+
# .xclbin file is successfully created, before using hardware_predict
119
122
120
-
The maximum of number of input samples that can processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size the additional samples will be ignored.
0 commit comments