Skip to content

Commit 548c462

Browse files
committed
Restructure documentation
1 parent 4fc1ea9 commit 548c462

30 files changed

+3114
-233
lines changed
File renamed without changes.
File renamed without changes.

docs/advanced/command.rst renamed to docs/api/command.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ hls4ml config
5050
5151
hls4ml config [-h] [-m MODEL] [-w WEIGHTS] [-o OUTPUT]
5252
53-
This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <../setup>` page for more details on how to write a configuration file.
53+
This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <../intro/setup>` page for more details on how to write a configuration file.
5454

5555
**Arguments**
5656

docs/api/concepts.rst

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
========
2+
Concepts
3+
========
4+
5+
How it Works
6+
----------------------
7+
8+
.. image:: ../img/nn_map_paper_fig_2.png
9+
:width: 70%
10+
:align: center
11+
12+
13+
Consider a multilayer neural network. At each neuron in a layer :math:`m` (containing :math:`N_m` neurons), we calculate an output value (part of the output vector :math:`\mathbf{x}_m` of said layer) using the sum of output values of the previous layer multiplied by independent weights for each of these values and a bias value. An activation function is performed on the result to get the final output value for the neuron. Representing the weights as a :math:`N_m` by :math:`N_{m-1}` matrix :math:`W_{m,m-1}`, the bias values as :math:`\mathbf{b}_m`, and the activation function as :math:`g_m`, we can express this compactly as:
14+
15+
16+
.. math::
17+
18+
\mathbf{x}_m = g_m (W_{m,m-1} \mathbf{x}_{m-1} +\mathbf{b}_m)
19+
20+
With hls4ml, each layer of output values is calculated independently in sequence, using pipelining to speed up the process by accepting new inputs after an initiation interval.
21+
The activations, if nontrivial, are precomputed.
22+
23+
To ensure optimal performance, the user can control aspects of their model, principally:
24+
25+
26+
* **Size/Compression** - Though not explicitly part of the ``hls4ml`` package, this is an important optimization to efficiently use the FPGA resources
27+
* **Precision** - Define the :doc:`precision <../advanced/profiling>` of the calculations in your model
28+
* **Dataflow/Resource Reuse** - Control parallel or streaming model implementations with varying levels of pipelining
29+
* **Quantization Aware Training** - Achieve best performance at low precision with tools like QKeras, and benefit automatically during inference with ``hls4ml`` parsing of QKeras models
30+
31+
32+
.. image:: ../img/reuse_factor_paper_fig_8.png
33+
:width: 70%
34+
:align: center
35+
36+
37+
Often, these decisions will be hardware dependent to maximize performance.
38+
Of note is that simplifying the input network must be done before using ``hls4ml`` to generate HLS code, for optimal compression to provide a sizable speedup.
39+
Also important to note is the use of fixed point arithmetic in ``hls4ml``.
40+
This improves processing speed relative to floating point implementations.
41+
The ``hls4ml`` package also offers the functionality of configuring binning and output bit width of the precomputed activation functions as necessary. With respect to parallelization and resource reuse, ``hls4ml`` offers a "reuse factor" parameter that determines the number of times each multiplier is used in order to compute a layer of neuron's values. Therefore, a reuse factor of one would split the computation so each multiplier had to only perform one multiplication in the computation of the output values of a layer, as shown above. Conversely, a reuse factor of four, in this case, uses a single multiplier four times sequentially. Low reuse factor achieves the lowest latency and highest throughput but uses the most resources, while high reuse factor save resources at the expense of longer latency and lower throughput.
42+
43+
44+
Frontends and Backends
45+
----------------------
46+
47+
``hls4ml`` has a concept of a **frontend** that parses the input NN into an internal model graph, and a **backend** that controls
48+
what type of output is produced from the graph. Frontends and backends can be independently chosen. Examples of frontends are the
49+
parsers for Keras or ONNX, and examples of backends are Vivado HLS, Intel HLS, and Vitis HLS. See :ref:`Status and Features` for the
50+
currently supported frontends and backends or the dedicated sections for each frontend/backend.
51+
52+
53+
I/O Types
54+
---------
55+
56+
``hls4ml`` supports multiple styles for handling data transfer to/from the network and between layers, known as the ``io_type``.
57+
58+
io_parallel
59+
^^^^^^^^^^^
60+
In this processing style, data is passed in parallel between the layers. Conceptually this corresponds to the C/C++ array where all elements can be accessed ay any time. This style allows for maximum parallelism and is well suited for MLP networks and small CNNs which aim for lowest latency. Due to the impact of parallel processing on resource utilization on FPGAs, the synthesis may fail for larger networks.
61+
62+
io_stream
63+
^^^^^^^^^
64+
As opposed to the parallel processing style, in ``io_stream`` mode data is passed one "pixel" at a time. Each pixel is an array of channels, which are always sent in parallel. This method for sending data between layers is recommended for larger CNN and RNN networks. For one-dimensional ``Dense`` layers, all the inputs are streamed in parallel as a single array.
65+
66+
With the ``io_stream`` IO type, each layer is connected with the subsequent layer through first-in first-out (FIFO) buffers.
67+
The implementation of the FIFO buffers contribute to the overall resource utilization of the design, impacting in particular the BRAM or LUT utilization.
68+
Because the neural networks can have complex architectures generally, it is hard to know a priori the correct depth of each FIFO buffer.
69+
By default ``hls4ml`` choses the most conservative possible depth for each FIFO buffer, which can result in a an unnecessary overutilization of resources.
70+
71+
In order to reduce the impact on the resources used for FIFO buffer implementation, we have a FIFO depth optimization flow. This is described
72+
in the :ref:`FIFO Buffer Depth Optimization` section.
73+
74+
75+
Strategy
76+
---------
77+
78+
**Strategy** in ``hls4ml`` refers to the implementation of core matrix-vector multiplication routine, which can be latency-oriented, resource-saving oriented, or specialized. Different strategies will have an impact on overall latency and resource consumption of each layer and users are advised to choose based on their design goals. The availability of particular strategy for a layer varies across backends, see the :doc:`Attributes <../ir/attributes>` section for a complete list of available strategies per-layer and per-backend.

docs/api/configuration.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ More than one layer can have a configuration specified, e.g.:
232232
dense2:
233233
...
234234
235-
For more information on the optimization parameters and what they mean, you can visit the :doc:`Concepts <../concepts>` chapter.
235+
For more information on the optimization parameters and what they mean, you can visit the :doc:`Concepts <../api/concepts>` section.
236236

237237
----
238238

docs/api/details.rst

Lines changed: 0 additions & 33 deletions
This file was deleted.

docs/advanced/accelerator.rst renamed to docs/backend/accelerator.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
=========================
2-
VivadoAccelerator Backend
3-
=========================
1+
=================
2+
VivadoAccelerator
3+
=================
44

5-
The ``VivadoAccelerator`` backend of ``hls4ml`` leverages the `PYNQ <http://pynq.io/>`_ software stack to easily deploy models on supported devices.
5+
The **VivadoAccelerator** backend of ``hls4ml`` leverages the `PYNQ <http://pynq.io/>`_ software stack to easily deploy models on supported devices.
66
Currently ``hls4ml`` supports the following boards:
77

88
* `pynq-z2 <https://www.xilinx.com/support/university/xup-boards/XUPPYNQ-Z2.html>`_ (part: ``xc7z020clg400-1``)

docs/backend/catapult.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
========
2+
Catapult
3+
========
4+
5+
Support for Siemens Catapult HLS compiler has been added in ``hls4ml`` version 1.0.0.
6+
7+
*TODO expand this section*

docs/advanced/oneapi.rst renamed to docs/backend/oneapi.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
==============
2-
oneAPI Backend
3-
==============
1+
======
2+
oneAPI
3+
======
44

55
The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
66
replace the ``Quartus`` backend, which targeted Intel HLS. (Quartus continues to be used with IP produced by the

docs/backend/quartus.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
=======
2+
Quartus
3+
=======
4+
5+
.. warning::
6+
Quartus backend is deprecated and will be removed in a future version. Users should migrate to oneAPI backend.
7+
8+
*TODO expand this section*

0 commit comments

Comments
 (0)