Update documentation.

Quentin Berthet · Quentin Berthet · commit 4171e2cd3d51 · 2024-07-10T13:03:48.000+02:00
diff --git a/docs/advanced/vitis_accelerator.rst b/docs/advanced/vitis_accelerator.rst
@@ -2,7 +2,7 @@
 VitisAccelerator Backend
 ========================
 
-The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project for `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
+The ``VitsAccelerator`` backend leverages the `Vitis System Design Flow <https://www.xilinx.com/products/design-tools/vitis.html#design-flows>`_ to automate and simplify the creation of an hls4ml project targeting `AMD Alveo PCIe accelerators <https://www.amd.com/en/products/accelerators/alveo.html>`_.
 The Vitis accelerator backend has been tested with the following boards:
 
 * `Alveo u50 <https://www.xilinx.com/products/boards-and-kits/alveo/u50.html>`_
@@ -15,18 +15,20 @@ Kernel wrapper
 
 To integrate with the Vitis System Design Flow and run on an accelerator, the generated ``hls4ml`` model must be encapsulated and built as a Vitis kernel (``.xo`` file) and linked into a binary file (``.xclbin``) during the implementation step. On the host side, standard C++ code using either `OpenCL <https://xilinx.github.io/XRT/master/html/opencl_extension.html>`_ or `XRT API <https://xilinx.github.io/XRT/master/html/xrt_native_apis.html>`_ can be used to download the ``.xclbin`` file to the accelerator card and use any kernel it contains.
 
-The ``VitisAccelerator`` backend generates automatically generate a kernel wrapper, an host code example, and a Makefile to build the project.
+The ``VitisAccelerator`` backend automatically generates a kernel wrapper, an host code example, and a Makefile to build the project.
+
+**Note:** The current implementation of the kernel wrapper code is oriented toward throughput benchmarking and not general inference uses (See :ref:`here<hardware_predict-method>`). It can nonetheless be further customized to fit specific applications.
 
 Options
 =======
 
 As PCIe accelerators are not suitable for ultra-low latency applications, it is assumed that they are used for high-throughput applications. To accommodate this, the backend supports the following options to optimize the kernel for throughput:
 
-    * ``num_kernel``: Number of kernel instances to implement in the hardware architecture.
+    * ``num_kernel``: Number of kernel instance to implement in the hardware architecture.
     * ``num_thread``: Number of host threads used to exercise the kernels in the host application.
     * ``batchsize``: Number of samples to be processed in a single kernel execution.
 
-Additionnaly, the backend propose the following options to customize the implementation:
+Additionaly, the backend proposes the following options to customize the implementation:
 
     * ``board``: The target board, must match one entry in ``supported_boards.json``.
     * ``clock_period``: The target clock period in ns.
@@ -80,3 +82,4 @@ The following example is a modified version of `hsl4ml example 7 <https://github
     )
     hls_model.compile()
     hls_model.build()
+    y = hls_model.predict_hardware(y) # Limited to batchsize * num_kernel * num_thread for now
diff --git a/docs/api/hls-model.rst b/docs/api/hls-model.rst
@@ -101,21 +101,23 @@ The trace method is an advanced version of the ``predict`` method. It's used to
    #We also support a similar function for keras
    keras_trace = hls4ml.model.profiling.get_ymodel_keras(keras_model, X)
 
-.. _hw_predict-method:
+----
 
-``hw_predict`` method
-======================
+.. _hardware_predict-method:
+
+``hardware_predict`` method
+===========================
 
-An specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
+A specialized version of the ``predict`` method, for the VitisAccelerator backend after a successful build. Runs the project on the FPGA and obtains prediction for the supplied numpy array.
 
-Note that the host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications.
+**Note:** The host code being run under the hood is an example written for generic benchmarking purposes, helpful for validating projects and gauging maximum throughput. It should be further adapted for more specific applications. Currently, the maximum number of input samples that can be processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size, the additional samples will be ignored.
+
+An optional ``target`` argument can be used to specify the target emulation mode (``hw``, ``sw_emu``, ``hw_emu``) to run the project on. The default is ``hw``.
 
 .. code-block:: python
 
    # Suppose that you already have input array X
-   # Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the .xclbin file is successfully created, before using hw_predict
-
-   y = hls_model.hw_predict(X)
+   # Note that you have to do both hls_model.compile() and hls_model.build(), ensuring the
+   # .xclbin file is successfully created, before using hardware_predict
 
-The maximum of number of input samples that can processed is ``batchsize * num_cu * num_buffer``. If the input array exceeds that size the additional samples will be ignored.
-----
+   y = hls_model.hardware_predict(X)