Skip to content

Commit f69a950

Browse files
author
Quentin Berthet
committed
Update documentation.
1 parent 92c0692 commit f69a950

File tree

2 files changed

+19
-14
lines changed

2 files changed

+19
-14
lines changed

docs/backend/accelerator.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ The ``predict`` method will send the input data to the PL and return the output
8080
VitisAccelerator
8181
================
8282

83-
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
83+
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
8484
The Vitis accelerator backend has been tested with the following boards:
8585

8686
* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
@@ -93,18 +93,20 @@ Kernel wrapper
9393

9494
To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
9595

96-
The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
96+
The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.
97+
98+
**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.
9799

98100
Options
99101
=======
100102

101103
As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
102104

103-
* ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
105+
* ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
104106
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
105107
* ``batchsize``: Number of samples to be processed in a single kernel execution.
106108

107-
Additionnaly, the backend propose the following options to customize the implementation:
109+
Additionaly, the backend proposes the following options to customize the implementation:
108110

109111
* ``board``: The target board, must match one entry in ``supported_boards.json``.
110112
* ``clock_period``: The target clock period in ns.
@@ -158,3 +160,4 @@ The following example is a modified version of `hsl4ml example 7 <https://github
158160
)
159161
hls_model.compile()
160162
hls_model.build()
163+
y = hls_model.predict_hardware(y) # Limited to batchsize * num_kernel * num_thread for now

docs/ir/modelgraph.rst

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -103,21 +103,23 @@ The trace method is an advanced version of the ``predict`` method. It's used to
103103
#We also support a similar function for keras
104104
keras_trace = hls4ml.model.profiling.get_ymodel_keras(keras_model, X)
105105
106-
.. _hw_predict-method:
106+
----
107107

108-
``hw_predict`` method
109-
======================
108+
.. _hardware_predict-method:
109+
110+
``hardware_predict`` method
111+
===========================
110112

111-
An specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
113+
A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
112114

113-
Note that the host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications.
115+
**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.
116+
117+
An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.
114118

115119
.. code-block:: python
116120
117121
# Suppose that you already have input array X
118-
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the .xclbin file is successfully created, before using hw_predict
119-
120-
y = hls_model.hw_predict(X)
122+
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
123+
# .xclbin file is successfully created, before using hardware_predict
121124
122-
The maximum of number of input samples that can processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size the additional samples will be ignored.
123-
----
125+
y = hls_model.hardware_predict(X)

0 commit comments

Comments
 (0)