You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/backend/accelerator.rst
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,7 +80,7 @@ The ``predict`` method will send the input data to the PL and return the output
80
80
VitisAccelerator
81
81
================
82
82
83
-
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
83
+
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
84
84
The Vitis accelerator backend has been tested with the following boards:
To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
95
95
96
-
The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
96
+
The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.
97
+
98
+
**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.
97
99
98
100
Options
99
101
=======
100
102
101
103
As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
102
104
103
-
* ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
105
+
* ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
104
106
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
105
107
* ``batchsize``: Number of samples to be processed in a single kernel execution.
106
108
107
-
Additionnaly, the backend propose the following options to customize the implementation:
109
+
Additionaly, the backend proposes the following options to customize the implementation:
108
110
109
111
* ``board``: The target board, must match one entry in ``supported_boards.json``.
110
112
* ``clock_period``: The target clock period in ns.
@@ -158,3 +160,4 @@ The following example is a modified version of `hsl4ml example 7 <https://github
158
160
)
159
161
hls_model.compile()
160
162
hls_model.build()
163
+
y = hls_model.predict_hardware(y) # Limited to batchsize * num_kernel * num_thread for now
An specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
113
+
A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
112
114
113
-
Note that the host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications.
115
+
**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.
116
+
117
+
An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.
114
118
115
119
.. code-block:: python
116
120
117
121
# Suppose that you already have input array X
118
-
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the .xclbin file is successfully created, before using hw_predict
119
-
120
-
y = hls_model.hw_predict(X)
122
+
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
123
+
# .xclbin file is successfully created, before using hardware_predict
121
124
122
-
The maximum of number of input samples that can processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size the additional samples will be ignored.
0 commit comments