Skip to content

Commit 4171e2c

Browse files
author
Quentin Berthet
committed
Update documentation.
1 parent 8841039 commit 4171e2c

File tree

2 files changed

+19
-14
lines changed

2 files changed

+19
-14
lines changed

docs/advanced/vitis_accelerator.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
VitisAccelerator Backend
33
========================
44

5-
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
5+
The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
66
The Vitis accelerator backend has been tested with the following boards:
77

88
* `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
@@ -15,18 +15,20 @@ Kernel wrapper
1515

1616
To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
1717

18-
The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
18+
The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.
19+
20+
**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.
1921

2022
Options
2123
=======
2224

2325
As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
2426

25-
* ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
27+
* ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
2628
* ``num_thread``: Number of host threads used to exercise the kernels in the host application.
2729
* ``batchsize``: Number of samples to be processed in a single kernel execution.
2830

29-
Additionnaly, the backend propose the following options to customize the implementation:
31+
Additionaly, the backend proposes the following options to customize the implementation:
3032

3133
* ``board``: The target board, must match one entry in ``supported_boards.json``.
3234
* ``clock_period``: The target clock period in ns.
@@ -80,3 +82,4 @@ The following example is a modified version of `hsl4ml example 7 <https://github
8082
)
8183
hls_model.compile()
8284
hls_model.build()
85+
y = hls_model.predict_hardware(y) # Limited to batchsize * num_kernel * num_thread for now

docs/api/hls-model.rst

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -101,21 +101,23 @@ The trace method is an advanced version of the ``predict`` method. It's used to
101101
#We also support a similar function for keras
102102
keras_trace = hls4ml.model.profiling.get_ymodel_keras(keras_model, X)
103103
104-
.. _hw_predict-method:
104+
----
105105

106-
``hw_predict`` method
107-
======================
106+
.. _hardware_predict-method:
107+
108+
``hardware_predict`` method
109+
===========================
108110

109-
An specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
111+
A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
110112

111-
Note that the host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications.
113+
**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.
114+
115+
An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.
112116

113117
.. code-block:: python
114118
115119
# Suppose that you already have input array X
116-
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the .xclbin file is successfully created, before using hw_predict
117-
118-
y = hls_model.hw_predict(X)
120+
# Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
121+
# .xclbin file is successfully created, before using hardware_predict
119122
120-
The maximum of number of input samples that can processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size the additional samples will be ignored.
121-
----
123+
y = hls_model.hardware_predict(X)

0 commit comments

Comments
 (0)