jmitrevs · laurilaatu · Feb 7, 2024 · Feb 8, 2024 · Feb 12, 2024 · Feb 12, 2024
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -2,15 +2,15 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte
 
 repos:
 - repo: https://github.com/psf/black
-  rev: 24.8.0
+  rev: 24.10.0
   hooks:
   - id: black
     language_version: python3
     args: ['--line-length=125',
            '--skip-string-normalization']
 
 - repo: https://github.com/pre-commit/pre-commit-hooks
-  rev: v4.6.0
+  rev: v5.0.0
   hooks:
   - id: check-added-large-files
   - id: check-case-conflict
@@ -30,13 +30,13 @@ repos:
     args: ["--profile", "black", --line-length=125]
 
 - repo: https://github.com/asottile/pyupgrade
-  rev: v3.17.0
+  rev: v3.19.0
   hooks:
   - id: pyupgrade
     args: ["--py36-plus"]
 
 - repo: https://github.com/asottile/setup-cfg-fmt
-  rev: v2.5.0
+  rev: v2.7.0
   hooks:
   - id: setup-cfg-fmt
 
@@ -50,7 +50,7 @@ repos:
            '--extend-ignore=E203,T201']  # E203 is not PEP8 compliant
 
 - repo: https://github.com/mgedmin/check-manifest
-  rev: "0.49"
+  rev: "0.50"
   hooks:
   - id: check-manifest
     stages: [manual]

diff --git a/CITATION.cff b/CITATION.cff
@@ -4,7 +4,7 @@ type: software
 authors:
 - given-names: "FastML Team"
 title: "hls4ml"
-version: "v0.8.1"
+version: "v1.0.0"
 doi: 10.5281/zenodo.1201549
 repository-code: "https://github.com/fastmachinelearning/hls4ml"
 url: "https://fastmachinelearning.org/hls4ml"

diff --git a/Jenkinsfile b/Jenkinsfile
@@ -16,7 +16,7 @@ pipeline {
           sh '''#!/bin/bash --login
               conda activate hls4ml-py310
               conda install -y jupyterhub pydot graphviz pytest pytest-cov
-              pip install pytest-randomly jupyter onnx>=1.4.0 matplotlib pandas seaborn pydigitalwavetools==1.1 pyyaml tensorflow==2.14 qonnx torch git+https://github.com/google/qkeras.git pyparsing
+              pip install pytest-randomly jupyter onnx>=1.4.0 matplotlib pandas seaborn pydigitalwavetools==1.1 pyyaml tensorflow==2.14 qonnx torch git+https://github.com/jmitrevs/qkeras.git@qrecurrent_unstack pyparsing
               pip install -U ../ --user
               ./convert-keras-models.sh -x -f keras-models.txt
               pip uninstall hls4ml -y'''

diff --git a/README.md b/README.md
@@ -15,7 +15,9 @@ If you have any questions, comments, or ideas regarding hls4ml or just want to s
 
 # Documentation & Tutorial
 
-For more information visit the webpage: [https://fastmachinelearning.org/hls4ml/](https://fastmachinelearning.org/hls4ml/)
+For more information visit the webpage: [https://fastmachinelearning.org/hls4ml/](https://fastmachinelearning.org/hls4ml/).
+
+For introductory material on FPGAs, HLS and ML inferences using hls4ml, check out the [video](https://www.youtube.com/watch?v=2y3GNY4tf7A&ab_channel=SystemsGroupatETHZ%C3%BCrich).
 
 Detailed tutorials on how to use `hls4ml`'s various functionalities can be found [here](https://github.com/hls-fpga-machine-learning/hls4ml-tutorial).
 
@@ -49,8 +51,8 @@ hls_model = hls4ml.converters.keras_to_hls(config)
 hls4ml.utils.fetch_example_list()
 ```
 
-### Building a project with Xilinx Vivado HLS (after downloading and installing from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html))
-Note: Vitis HLS is not yet supported. Vivado HLS versions between 2018.2 and 2020.1 are recommended.
+### Building a project.
+We will build the project using Xilinx Vivado HLS, which can be downloaded and installed from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html). Alongside Vivado HLS, hls4ml also supports Vitis HLS, Intel HLS, Catapult HLS and has some experimental support dor Intel oneAPI. The target back-end can be changed using the argument backend when building the model.
 
 ```Python
 # Use Vivado HLS to synthesize the model
@@ -61,15 +63,19 @@ hls_model.build()
 hls4ml.report.read_vivado_report('my-hls-test')
 ```
 
+# FAQ
+
+List of frequently asked questions and common HLS synthesis can be found [here](https://fastmachinelearning.org/hls4ml/faq.html)
+
 # Citation
 If you use this software in a publication, please cite the software
 ```bibtex
 @software{fastml_hls4ml,
   author       = {{FastML Team}},
   title        = {fastmachinelearning/hls4ml},
-  year         = 2023,
+  year         = 2024,
   publisher    = {Zenodo},
-  version      = {v0.8.1},
+  version      = {v1.0.0},
   doi          = {10.5281/zenodo.1201549},
   url          = {https://github.com/fastmachinelearning/hls4ml}
 }

diff --git a/docs/advanced/auto.rst b/docs/advanced/auto.rst
@@ -0,0 +1,22 @@
+=============================
+Automatic precision inference
+=============================
+
+The automatic precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.infer_precision.InferPrecisionTypes`) attempts to infer the appropriate
+widths for a given precision. It is initiated by setting a precision in the configuration as ``'auto'``. (Note, only layer-level precisions can be set to ``'auto'``,
+not model-level.)  Functions like :py:class:`~hls4ml.utils.config.config_from_keras_model`, :py:class:`~hls4ml.utils.config.config_from_onnx_model`,
+and :py:class:`~hls4ml.utils.config.config_from_pytorch_model` automatically set most precisions to ``'auto'`` if the ``'name'`` granularity is used.
+
+.. note::
+    It is recommended to pass the backend to the ``config_from_*`` functions so that they can properly extract all the configurable precisions.
+
+The approach taken by the precision inference is to set accumulator (the internal variable used to accumulate values in the matrix multiplications) and other precisions
+to never truncate, using only the bitwidths of the inputs (not the values). This is quite conservative, especially in cases where post-training quantization is used, or
+if the bit widths were set fairly loosely. The recommended action in that case is to edit the configuration and explicitly set some widths in it, potentially in an iterative process
+after profiling the data. Another option is to pass a maximum precision using the ``max_precison`` parameter of the ``config_form_*`` functions. Then the automatic precision
+inference will never set a bitwdith larger than the bitwidth of the ``max_precision`` or an integer part larger than the integer part of the ``max_precision`` that is passed.
+(The bitwidth and integer parts of the ``max_precision`` are treated separately.)
+
+When manually setting bitdwidths, the accumulator can overflow, and the precision may need to be reduced. For the accumulator, it is usually a bad idea to explicitly
+enable rounding or saturation modes since it dramatically increases the execution time. For other types (e.g. output types or weight types), however, rounding and saturation handling
+can be enabled as needed.
diff --git a/docs/advanced/bramfactor.rst b/docs/advanced/bramfactor.rst
@@ -0,0 +1,42 @@
+==================================
+Loading weights from external BRAM
+==================================
+
+.. note::
+    This feature is being evaluated for re-implementation. We welcome feedback from users how to make the implementation more flexible.
+
+``hls4ml`` can optionally store weights in BRAMs external to the design. This is supported in Vivado/Vitis and Catapult backends. It is the responsibility of the user to ensure the weights are properly loaded during the operation of the design.
+
+The feature works as a threshold, exposed through a ``BramFactor`` config parameter. Layers with more weights above the threshold will be exposed as BRAM interface. Consider the following code:
+
+.. code-block:: Python
+
+    model = tf.keras.models.Sequential()
+    model.add(Dense(10, activation="relu", input_shape=(12,), name="dense_1"))
+    model.add(Dense(20, activation="relu", name="dense_2"))
+    model.add(Dense(5, activation="softmax", name="dense_3"))
+    model.compile(optimizer='adam', loss='mse')
+
+    config = hls4ml.utils.config_from_keras_model(model)
+    config["Model"]["Strategy"] = "Resource"
+    config["Model"]["BramFactor"] = 100
+
+    hls_model = hls4ml.converters.convert_from_keras_model(
+        model, hls_config=config, output_dir=output_dir, io_type=io_type, backend=backend
+    )
+
+Having set ``BramFactor=100``, only layers with more than 100 weights will be exposed as external BRAM, in this case layers ``dense_1`` and ``dense_2``. ``BramFactor`` can currently be only set at the model level. The generated code will now have weights as part of the interface.
+
+.. code-block:: C++
+
+    void myproject(
+        hls::stream<input_t> &dense_1_input,
+        hls::stream<result_t> &layer7_out,
+        model_default_t w2[120],
+        model_default_t w4[200]
+    ) {
+        #pragma HLS INTERFACE axis port=dense_1_input,layer7_out
+        #pragma HLS INTERFACE bram port=w2,w4
+        ...
+
+When integrating the design, users can use the exposed interface to implement weight reloading scheme.
diff --git a/docs/advanced/hgq.rst b/docs/advanced/hgq.rst
@@ -0,0 +1,49 @@
+===================================
+High Granularity Quantization (HGQ)
+===================================
+
+.. image:: https://github.com/calad0i/HGQ/actions/workflows/sphinx-build.yml/badge.svg
+   :target: https://calad0i.github.io/HGQ/
+.. image:: https://badge.fury.io/py/hgq.svg
+   :target: https://badge.fury.io/py/hgq
+.. image:: https://img.shields.io/badge/arXiv-2405.00645-b31b1b.svg
+   :target: https://arxiv.org/abs/2405.00645
+
+`High Granularity Quantization (HGQ) <https://github.com/calad0i/HGQ/>`_ is a library that performs gradient-based automatic bitwidth optimization and quantization-aware training algorithm for neural networks to be deployed on FPGAs. By leveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
+
+.. image:: https://calad0i.github.io/HGQ/_images/overview.svg
+   :alt: Overview of HGQ
+   :align: center
+
+Conversion of models made with HGQ library is fully supported. The HGQ models are first converted to proxy model format, which can then be parsed by hls4ml bit-accurately. Below is an example of how to create a model with HGQ and convert it to hls4ml model.
+
+.. code-block:: Python
+
+   import keras
+   from HGQ.layers import HDense, HDenseBatchNorm, HQuantize
+   from HGQ import ResetMinMax, FreeBOPs
+
+   model = keras.models.Sequential([
+      HQuantize(beta=1.e-5),
+      HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
+      HDenseBatchNorm(32, beta=1.e-5, activation='relu'),
+      HDense(10, beta=1.e-5),
+   ])
+
+    opt = keras.optimizers.Adam(learning_rate=0.001)
+    loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
+    model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
+    callbacks = [ResetMinMax(), FreeBOPs()]
+
+    model.fit(..., callbacks=callbacks)
+
+    from HGQ import trace_minmax, to_proxy_model
+    from hls4ml.converters import convert_from_keras_model
+
+    trace_minmax(model, x_train, cover_factor=1.0)
+    proxy = to_proxy_model(model, aggressive=True)
+
+    model_hls = convert_from_keras_model(proxy, backend='vivado',output_dir=... ,part=...)
+
+
+An interactive example of HGQ can be found in the `kaggle notebook <https://www.kaggle.com/code/calad0i/small-jet-tagger-with-hgq-1>`_. Full documentation can be found at `calad0i.github.io/HGQ <https://calad0i.github.io/HGQ/>`_.
diff --git a/docs/advanced/model_optimization.rst b/docs/advanced/model_optimization.rst
@@ -13,11 +13,11 @@ The code block below showcases three use cases of the hls4ml Optimization API -
     from tensorflow.keras.optimizers import Adam
     from tensorflow.keras.metrics import CategoricalAccuracy
     from tensorflow.keras.losses import CategoricalCrossentropy
-    from hls4ml.optimization.keras import optimize_model
-    from hls4ml.optimization.keras.utils import get_model_sparsity
-    from hls4ml.optimization.attributes import get_attributes_from_keras_model
-    from hls4ml.optimization.objectives import ParameterEstimator
-    from hls4ml.optimization.scheduler import PolynomialScheduler
+    from hls4ml.optimization.dsp_aware_pruning.keras import optimize_model
+    from hls4ml.optimization.dsp_aware_pruning.keras.utils import get_model_sparsity
+    from hls4ml.optimization.dsp_aware_pruning.attributes import get_attributes_from_keras_model
+    from hls4ml.optimization.dsp_aware_pruning.objectives import ParameterEstimator
+    from hls4ml.optimization.dsp_aware_pruning.scheduler import PolynomialScheduler
     # Define baseline model and load data
     # X_train, y_train = ...
     # X_val, y_val = ...
@@ -75,7 +75,7 @@ To optimize GPU FLOPs, the code is similar to above:
 
 .. code-block:: Python
 
-    from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator
+    from hls4ml.optimization.dsp_aware_pruning.objectives.gpu_objectives import GPUFLOPEstimator
 
     # Optimize model
     # Note the change from ParameterEstimator to GPUFLOPEstimator
@@ -98,7 +98,7 @@ Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
 .. code-block:: Python
 
     from hls4ml.utils.config import config_from_keras_model
-    from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator
+    from hls4ml.optimization.dsp_aware_pruning.objectives.vivado_objectives import VivadoDSPEstimator
 
     # Note the change from optimize_model to optimize_keras_model_for_hls4ml
     # The function optimize_keras_model_for_hls4ml acts as a wrapper for the function, parsing hls4ml config to model attributes
@@ -124,11 +124,11 @@ Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
     acc_optimized = accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_optimized, axis=1))
     print(f'Optimized Keras accuracy: {acc_optimized}')
 
-There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilisation and VivadoMultiObjectiveEstimator, aimed at optimising BRAM and DSP utilisation.
-Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesing HLS, by modifying the config:
+There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilization and VivadoMultiObjectiveEstimator, aimed at optimizing BRAM and DSP utilization.
+Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesizing HLS, by modifying the config:
 
 .. code-block:: Python
 
     hls_config = config_from_keras_model(optimized_model)
-    hls_config['Model']['DenseResourceImplementation'] = 'Unrolled'
-    # Any addition hls4ml config, such as strategy, reuse factor etc...
+    hls_config['Model']['Strategy'] = 'Unrolled'
+    # Any addition hls4ml config, reuse factor etc...
diff --git a/docs/api/profiling.rst → docs/advanced/profiling.rst b/docs/api/profiling.rst → docs/advanced/profiling.rst
diff --git a/docs/command.rst → docs/api/command.rst b/docs/command.rst → docs/api/command.rst
@@ -50,7 +50,7 @@ hls4ml config
 
    hls4ml config [-h] [-m MODEL] [-w WEIGHTS] [-o OUTPUT]
 
-This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <setup>` page for more details on how to write a configuration file.
+This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <../intro/setup>` page for more details on how to write a configuration file.
 
 **Arguments**