Skip to content

Commit 4aff443

Browse files
committed
Merge branch 'master' into hls4ml-optimization-api-part-1
2 parents 7c2d128 + 033d438 commit 4aff443

34 files changed

+653
-189
lines changed

.gitlab-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ generator:
77
stage: generate
88
image: python:3.8-alpine
99
tags:
10-
- docker
10+
- k8s-default
1111
before_script:
1212
- pip install pyyaml
1313
script:

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte
22

33
repos:
44
- repo: https://github.com/psf/black
5-
rev: 23.7.0
5+
rev: 23.11.0
66
hooks:
77
- id: black
88
language_version: python3
99
args: ['--line-length=125',
1010
'--skip-string-normalization']
1111

1212
- repo: https://github.com/pre-commit/pre-commit-hooks
13-
rev: v4.4.0
13+
rev: v4.5.0
1414
hooks:
1515
- id: check-added-large-files
1616
- id: check-case-conflict
@@ -30,13 +30,13 @@ repos:
3030
args: ["--profile", "black", --line-length=125]
3131

3232
- repo: https://github.com/asottile/pyupgrade
33-
rev: v3.10.1
33+
rev: v3.15.0
3434
hooks:
3535
- id: pyupgrade
3636
args: ["--py36-plus"]
3737

3838
- repo: https://github.com/asottile/setup-cfg-fmt
39-
rev: v2.4.0
39+
rev: v2.5.0
4040
hooks:
4141
- id: setup-cfg-fmt
4242

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ type: software
44
authors:
55
- given-names: "FastML Team"
66
title: "hls4ml"
7-
version: "v0.7.1"
7+
version: "v0.8.0"
88
doi: 10.5281/zenodo.1201549
99
repository-code: "https://github.com/fastmachinelearning/hls4ml"
1010
url: "https://fastmachinelearning.org/hls4ml"

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<p float="left">
1+
<p align="center">
22
<img src="https://github.com/fastmachinelearning/fastmachinelearning.github.io/raw/master/images/hls4ml_logo.svg" alt="hls4ml" width="400"/>
33
</p>
44

@@ -69,7 +69,7 @@ If you use this software in a publication, please cite the software
6969
title = {fastmachinelearning/hls4ml},
7070
year = 2023,
7171
publisher = {Zenodo},
72-
version = {v0.7.1},
72+
version = {v0.8.0},
7373
doi = {10.5281/zenodo.1201549},
7474
url = {https://github.com/fastmachinelearning/hls4ml}
7575
}
@@ -135,3 +135,18 @@ binary/ternary networks:
135135
year = "2021"
136136
}
137137
```
138+
139+
# Acknowledgments
140+
If you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.
141+
Please use the following text for this acknowledgment:
142+
> We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \<names of individuals\>, in particular, were important for the development of this project.
143+
144+
# Funding
145+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for <a href="https://a3d3.ai">Accelerating AI Algorithms for Data Driven Discovery (A3D3)</a> under Cooperative Agreement No. <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997">OAC-2117997</a>, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (<a href="https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf">DE-FOA-0002501</a>), DOE Office of Science, Office of High Energy Physics Early Career Research Program (<a href="https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10">DE-SC0021187</a>, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. <a href="https://doi.org/10.3030/772369">772369</a>).
146+
147+
<p align="center">
148+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/29201053/bd1217d4-9930-47b7-8917-ad3fc430c75d" alt="A3D3" width="130"/>
149+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/16e77374-9829-40a8-800e-8d12018a7cb3" alt="NSF" width="130"/>
150+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/de6ca6ea-4d1c-4c56-9d93-f759914bbbf9" alt="DOE" width="130"/>
151+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/7a369971-a381-4bb8-932a-7162b173cbac" alt="ERC" width="130"/>
152+
</p>

docs/api/configuration.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ It looks like this:
7070
OutputPredictions: keras/KERAS_3layer_predictions.dat
7171
7272
# Backend section (Vivado backend)
73-
Part: xcku115-flvb2104-2-i
73+
Part: xcvu13p-flga2577-2-e
7474
ClockPeriod: 5
7575
IOType: io_parallel # options: io_parallel/io_stream
7676
@@ -97,7 +97,7 @@ There are a number of configuration options that you have. Let's go through the
9797
The backend-specific section of the configuration depends on the backend. You can get a starting point for the necessary settings using, for example `hls4ml.templates.get_backend('Vivado').create_initial_config()`.
9898
For Vivado backend the options are:
9999

100-
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex-7 FPGA
100+
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex UltraScale+ VU13P FPGA
101101
* **ClockPeriod**\ : the clock period, in ns, at which your algorithm runs
102102
Then you have some optimization parameters for how your algorithm runs:
103103
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines the type of data structure used for inputs, intermediate activations between layers, and outputs. For ``io_parallel``, arrays are used that, in principle, can be fully unrolled and are typically implemented in RAMs. For ``io_stream``, HLS streams are used, which are a more efficient/scalable mechanism to represent data that are produced and consumed in a sequential manner. Typically, HLS streams are implemented with FIFOs instead of RAMs. For more information see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.

docs/reference.rst

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
============================
2-
Citation and Contributors
3-
============================
1+
===========================================
2+
Citation, Acknowledgments, and Contributors
3+
===========================================
44

55

66
Citation
@@ -14,7 +14,7 @@ If you use this software in a publication, please cite the software
1414
title = {fastmachinelearning/hls4ml},
1515
year = 2023,
1616
publisher = {Zenodo},
17-
version = {v0.7.1},
17+
version = {v0.8.0},
1818
doi = {10.5281/zenodo.1201549},
1919
url = {https://github.com/fastmachinelearning/hls4ml}
2020
}
@@ -86,6 +86,34 @@ binary/ternary networks:
8686
year = "2021"
8787
}
8888
89+
Acknowledgments
90+
===============
91+
If you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.
92+
Please use the following text for this acknowledgment:
93+
94+
We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \<names of individuals\>, in particular, were important for the development of this project.
95+
96+
97+
Funding
98+
=======
99+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for `Accelerating AI Algorithms for Data Driven Discovery (A3D3) <https://a3d3.ai>`_ under Cooperative Agreement No. `OAC-2117997 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997>`_, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (`DE-FOA-0002501 <https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf>`_), DOE Office of Science, Office of High Energy Physics Early Career Research Program (`DE-SC0021187 <https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10>`_, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. `772369 <https://doi.org/10.3030/772369>`_).
100+
101+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/d4b6e2a3-3537-4413-9809-8153a7d624d6
102+
:height: 200
103+
:align: center
104+
105+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/16e77374-9829-40a8-800e-8d12018a7cb3
106+
:height: 200
107+
:align: center
108+
109+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/de6ca6ea-4d1c-4c56-9d93-f759914bbbf9
110+
:height: 200
111+
:align: center
112+
113+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/7a369971-a381-4bb8-932a-7162b173cbac
114+
:height: 200
115+
:align: center
116+
89117
Contributors
90118
============
91119

hls4ml/backends/fpga/passes/clone.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,19 @@ def initialize(self):
2020
class CloneFunctionTemplate(FunctionCallTemplate):
2121
def __init__(self):
2222
super().__init__(Clone, include_header=clone_include_list)
23-
self.template = None # to be filled once number of clones known
2423

2524
def format(self, node):
2625
params = self._default_function_params(node)
2726
for i, _output in enumerate(node.outputs):
2827
params['output' + str(i + 1)] = node.variables[node.outputs[i]].name
2928

30-
if self.template is None:
31-
self.template = (
32-
'nnet::clone_stream<{input_t}, {output_t}, {size}>({input}, '
33-
+ ', '.join(['{output' + str(i + 1) + '}' for i in range(len(node.outputs))])
34-
+ ');'
35-
)
29+
template = (
30+
'nnet::clone_stream<{input_t}, {output_t}, {size}>({input}, '
31+
+ ', '.join(['{output' + str(i + 1) + '}' for i in range(len(node.outputs))])
32+
+ ');'
33+
)
3634

37-
return self.template.format(**params)
35+
return template.format(**params)
3836

3937

4038
def register_clone(backend):
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import warnings
2+
3+
from hls4ml.model.layers import Layer, Softmax
4+
from hls4ml.model.optimizer import OptimizerPass
5+
6+
7+
class FixSoftmaxTableSize(OptimizerPass):
8+
def match(self, node):
9+
return isinstance(node, Softmax)
10+
11+
def transform(self, model, node: Layer):
12+
inp_layer = node.get_input_node() # type: ignore
13+
if not isinstance(inp_layer, Layer):
14+
raise RuntimeError(f'Softmax layer {node.name} does not have an input layer')
15+
16+
input_bw: int = inp_layer.get_attr('result_t').precision.width # type: ignore
17+
table_bw: int = node.get_attr('inv_table_t').precision.width # type: ignore
18+
table_size = int(node.get_attr('table_size')) # type: ignore
19+
20+
backend = model.config.config['Backend']
21+
22+
# Somehow, Intel want one extra bits for the table.
23+
# I don't know why but if not simulation will crash with segmentation fault.
24+
backend_limitation = -1 if backend == 'Quartus' else 0
25+
26+
if 2 ** (min(input_bw, table_bw) + backend_limitation) < table_size:
27+
# If table size is too large w.r.t. input bitwidth and table bitwidth,
28+
# reduce table size to avoid undefined behavior when cutting indices from,
29+
# fixed point number.
30+
node.set_attr('table_size', str(2 ** (min(input_bw, table_bw) + backend_limitation)))
31+
if 2**input_bw < table_size:
32+
# The warning message does not have to be looking like this, but you are asking
33+
# 125 characters long line.
34+
warnings.warn(
35+
(
36+
f"Softmax layer {node.name} table size is too large for input"
37+
f"bitwidth {input_bw}. Setting table size to {2**input_bw}."
38+
"To avoid this warning, please increase input bitwidth or"
39+
"decrease table size."
40+
),
41+
stacklevel=1,
42+
)
43+
if 2**table_bw < table_size:
44+
warnings.warn(
45+
(
46+
f"Softmax layer {node.name} table size is too large for input"
47+
f"bitwidth {input_bw}. Setting table size to {2**input_bw}."
48+
"To avoid this warning, please increase input bitwidth or"
49+
"decrease table size."
50+
),
51+
stacklevel=1,
52+
)
53+
if backend == 'Quartus':
54+
warnings.warn(
55+
(
56+
"Quartus backend's table size is half of 2^min(input_bw-1,table_bw-1)"
57+
" instead of 2^min(input_bw,table_bw)."
58+
),
59+
stacklevel=1,
60+
)
61+
return False
62+
63+
64+
def register_softmax__table_size_fix(backend):
65+
backend.register_pass('fix_softmax_table_size', FixSoftmaxTableSize)

hls4ml/backends/fpga/passes/repack_stream.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ def transform(self, model, node):
5959

6060
# Insert new Repack node instead of Reshape
6161
repack_layer = model.make_node(Repack, 'repack_' + node.name, attrs, node.inputs.copy())
62+
# As result_t attribute is not honored by type conversion, set it manually here
63+
repack_layer.attributes[repack_layer.name].type = node.attributes[node.name].type
6264
model.replace_node(node, repack_layer)
6365

6466
return True

hls4ml/backends/quartus/quartus_backend.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ def _register_flows(self):
7272
'quartus:inplace_parallel_reshape',
7373
'quartus:inplace_stream_flatten',
7474
'quartus:skip_softmax',
75+
'quartus:fix_softmax_table_size',
7576
]
7677
optimization_flow = register_flow('optimize', optimization_passes, requires=[init_flow], backend=self.name)
7778

0 commit comments

Comments
 (0)