Skip to content

Commit 1156ff9

Browse files
authored
Merge branch 'fastmachinelearning:hls4ml-optimization-api-part-1' into hls4ml-optimization-api-part-1
2 parents e044a12 + 4aff443 commit 1156ff9

24 files changed

+405
-93
lines changed

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@ exclude: (^hls4ml\/templates\/(vivado|quartus)\/(ap_types|ac_types)\/|^test/pyte
22

33
repos:
44
- repo: https://github.com/psf/black
5-
rev: 23.9.1
5+
rev: 23.11.0
66
hooks:
77
- id: black
88
language_version: python3
99
args: ['--line-length=125',
1010
'--skip-string-normalization']
1111

1212
- repo: https://github.com/pre-commit/pre-commit-hooks
13-
rev: v4.4.0
13+
rev: v4.5.0
1414
hooks:
1515
- id: check-added-large-files
1616
- id: check-case-conflict
@@ -30,7 +30,7 @@ repos:
3030
args: ["--profile", "black", --line-length=125]
3131

3232
- repo: https://github.com/asottile/pyupgrade
33-
rev: v3.14.0
33+
rev: v3.15.0
3434
hooks:
3535
- id: pyupgrade
3636
args: ["--py36-plus"]

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ type: software
44
authors:
55
- given-names: "FastML Team"
66
title: "hls4ml"
7-
version: "v0.7.1"
7+
version: "v0.8.0"
88
doi: 10.5281/zenodo.1201549
99
repository-code: "https://github.com/fastmachinelearning/hls4ml"
1010
url: "https://fastmachinelearning.org/hls4ml"

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<p float="left">
1+
<p align="center">
22
<img src="https://github.com/fastmachinelearning/fastmachinelearning.github.io/raw/master/images/hls4ml_logo.svg" alt="hls4ml" width="400"/>
33
</p>
44

@@ -69,7 +69,7 @@ If you use this software in a publication, please cite the software
6969
title = {fastmachinelearning/hls4ml},
7070
year = 2023,
7171
publisher = {Zenodo},
72-
version = {v0.7.1},
72+
version = {v0.8.0},
7373
doi = {10.5281/zenodo.1201549},
7474
url = {https://github.com/fastmachinelearning/hls4ml}
7575
}
@@ -140,3 +140,13 @@ binary/ternary networks:
140140
If you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.
141141
Please use the following text for this acknowledgment:
142142
> We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \<names of individuals\>, in particular, were important for the development of this project.
143+
144+
# Funding
145+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for <a href="https://a3d3.ai">Accelerating AI Algorithms for Data Driven Discovery (A3D3)</a> under Cooperative Agreement No. <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997">OAC-2117997</a>, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (<a href="https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf">DE-FOA-0002501</a>), DOE Office of Science, Office of High Energy Physics Early Career Research Program (<a href="https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10">DE-SC0021187</a>, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. <a href="https://doi.org/10.3030/772369">772369</a>).
146+
147+
<p align="center">
148+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/29201053/bd1217d4-9930-47b7-8917-ad3fc430c75d" alt="A3D3" width="130"/>
149+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/16e77374-9829-40a8-800e-8d12018a7cb3" alt="NSF" width="130"/>
150+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/de6ca6ea-4d1c-4c56-9d93-f759914bbbf9" alt="DOE" width="130"/>
151+
<img src="https://github.com/fastmachinelearning/hls4ml/assets/4932543/7a369971-a381-4bb8-932a-7162b173cbac" alt="ERC" width="130"/>
152+
</p>

docs/api/configuration.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ It looks like this:
7070
OutputPredictions: keras/KERAS_3layer_predictions.dat
7171
7272
# Backend section (Vivado backend)
73-
Part: xcku115-flvb2104-2-i
73+
Part: xcvu13p-flga2577-2-e
7474
ClockPeriod: 5
7575
IOType: io_parallel # options: io_parallel/io_stream
7676
@@ -97,7 +97,7 @@ There are a number of configuration options that you have. Let's go through the
9797
The backend-specific section of the configuration depends on the backend. You can get a starting point for the necessary settings using, for example `hls4ml.templates.get_backend('Vivado').create_initial_config()`.
9898
For Vivado backend the options are:
9999

100-
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex-7 FPGA
100+
* **Part**\ : the particular FPGA part number that you are considering, here it's a Xilinx Virtex UltraScale+ VU13P FPGA
101101
* **ClockPeriod**\ : the clock period, in ns, at which your algorithm runs
102102
Then you have some optimization parameters for how your algorithm runs:
103103
* **IOType**\ : your options are ``io_parallel`` or ``io_stream`` which defines the type of data structure used for inputs, intermediate activations between layers, and outputs. For ``io_parallel``, arrays are used that, in principle, can be fully unrolled and are typically implemented in RAMs. For ``io_stream``, HLS streams are used, which are a more efficient/scalable mechanism to represent data that are produced and consumed in a sequential manner. Typically, HLS streams are implemented with FIFOs instead of RAMs. For more information see `here <https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/pragma-HLS-stream>`__.

docs/reference.rst

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
============================
2-
Citation and Contributors
3-
============================
1+
===========================================
2+
Citation, Acknowledgments, and Contributors
3+
===========================================
44

55

66
Citation
@@ -14,7 +14,7 @@ If you use this software in a publication, please cite the software
1414
title = {fastmachinelearning/hls4ml},
1515
year = 2023,
1616
publisher = {Zenodo},
17-
version = {v0.7.1},
17+
version = {v0.8.0},
1818
doi = {10.5281/zenodo.1201549},
1919
url = {https://github.com/fastmachinelearning/hls4ml}
2020
}
@@ -90,9 +90,30 @@ Acknowledgments
9090
===============
9191
If you benefited from participating in our community, we ask that you please acknowledge the Fast Machine Learning collaboration, and particular individuals who helped you, in any publications.
9292
Please use the following text for this acknowledgment:
93+
9394
We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community and \<names of individuals\>, in particular, were important for the development of this project.
9495

9596

97+
Funding
98+
=======
99+
We gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for `Accelerating AI Algorithms for Data Driven Discovery (A3D3) <https://a3d3.ai>`_ under Cooperative Agreement No. `OAC-2117997 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117997>`_, U.S. Department of Energy (DOE) Office of Science, Office of Advanced Scientific Computing Research under the Real‐time Data Reduction Codesign at the Extreme Edge for Science (XDR) Project (`DE-FOA-0002501 <https://science.osti.gov/-/media/grants/pdf/foas/2021/SC_FOA_0002501.pdf>`_), DOE Office of Science, Office of High Energy Physics Early Career Research Program (`DE-SC0021187 <https://pamspublic.science.energy.gov/WebPAMSExternal/Interface/Common/ViewPublicAbstract.aspx?rv=df0ae4ab-a46e-481a-9acc-3856b6b041e5&rtc=24&PRoleId=10>`_, DE-0000247070), and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant No. `772369 <https://doi.org/10.3030/772369>`_).
100+
101+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/d4b6e2a3-3537-4413-9809-8153a7d624d6
102+
:height: 200
103+
:align: center
104+
105+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/16e77374-9829-40a8-800e-8d12018a7cb3
106+
:height: 200
107+
:align: center
108+
109+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/de6ca6ea-4d1c-4c56-9d93-f759914bbbf9
110+
:height: 200
111+
:align: center
112+
113+
.. image:: https://github.com/fastmachinelearning/hls4ml/assets/4932543/7a369971-a381-4bb8-932a-7162b173cbac
114+
:height: 200
115+
:align: center
116+
96117
Contributors
97118
============
98119

hls4ml/backends/fpga/passes/clone.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,19 @@ def initialize(self):
2020
class CloneFunctionTemplate(FunctionCallTemplate):
2121
def __init__(self):
2222
super().__init__(Clone, include_header=clone_include_list)
23-
self.template = None # to be filled once number of clones known
2423

2524
def format(self, node):
2625
params = self._default_function_params(node)
2726
for i, _output in enumerate(node.outputs):
2827
params['output' + str(i + 1)] = node.variables[node.outputs[i]].name
2928

30-
if self.template is None:
31-
self.template = (
32-
'nnet::clone_stream<{input_t}, {output_t}, {size}>({input}, '
33-
+ ', '.join(['{output' + str(i + 1) + '}' for i in range(len(node.outputs))])
34-
+ ');'
35-
)
29+
template = (
30+
'nnet::clone_stream<{input_t}, {output_t}, {size}>({input}, '
31+
+ ', '.join(['{output' + str(i + 1) + '}' for i in range(len(node.outputs))])
32+
+ ');'
33+
)
3634

37-
return self.template.format(**params)
35+
return template.format(**params)
3836

3937

4038
def register_clone(backend):
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import warnings
2+
3+
from hls4ml.model.layers import Layer, Softmax
4+
from hls4ml.model.optimizer import OptimizerPass
5+
6+
7+
class FixSoftmaxTableSize(OptimizerPass):
8+
def match(self, node):
9+
return isinstance(node, Softmax)
10+
11+
def transform(self, model, node: Layer):
12+
inp_layer = node.get_input_node() # type: ignore
13+
if not isinstance(inp_layer, Layer):
14+
raise RuntimeError(f'Softmax layer {node.name} does not have an input layer')
15+
16+
input_bw: int = inp_layer.get_attr('result_t').precision.width # type: ignore
17+
table_bw: int = node.get_attr('inv_table_t').precision.width # type: ignore
18+
table_size = int(node.get_attr('table_size')) # type: ignore
19+
20+
backend = model.config.config['Backend']
21+
22+
# Somehow, Intel want one extra bits for the table.
23+
# I don't know why but if not simulation will crash with segmentation fault.
24+
backend_limitation = -1 if backend == 'Quartus' else 0
25+
26+
if 2 ** (min(input_bw, table_bw) + backend_limitation) < table_size:
27+
# If table size is too large w.r.t. input bitwidth and table bitwidth,
28+
# reduce table size to avoid undefined behavior when cutting indices from,
29+
# fixed point number.
30+
node.set_attr('table_size', str(2 ** (min(input_bw, table_bw) + backend_limitation)))
31+
if 2**input_bw < table_size:
32+
# The warning message does not have to be looking like this, but you are asking
33+
# 125 characters long line.
34+
warnings.warn(
35+
(
36+
f"Softmax layer {node.name} table size is too large for input"
37+
f"bitwidth {input_bw}. Setting table size to {2**input_bw}."
38+
"To avoid this warning, please increase input bitwidth or"
39+
"decrease table size."
40+
),
41+
stacklevel=1,
42+
)
43+
if 2**table_bw < table_size:
44+
warnings.warn(
45+
(
46+
f"Softmax layer {node.name} table size is too large for input"
47+
f"bitwidth {input_bw}. Setting table size to {2**input_bw}."
48+
"To avoid this warning, please increase input bitwidth or"
49+
"decrease table size."
50+
),
51+
stacklevel=1,
52+
)
53+
if backend == 'Quartus':
54+
warnings.warn(
55+
(
56+
"Quartus backend's table size is half of 2^min(input_bw-1,table_bw-1)"
57+
" instead of 2^min(input_bw,table_bw)."
58+
),
59+
stacklevel=1,
60+
)
61+
return False
62+
63+
64+
def register_softmax__table_size_fix(backend):
65+
backend.register_pass('fix_softmax_table_size', FixSoftmaxTableSize)

hls4ml/backends/fpga/passes/repack_stream.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ def transform(self, model, node):
5959

6060
# Insert new Repack node instead of Reshape
6161
repack_layer = model.make_node(Repack, 'repack_' + node.name, attrs, node.inputs.copy())
62+
# As result_t attribute is not honored by type conversion, set it manually here
63+
repack_layer.attributes[repack_layer.name].type = node.attributes[node.name].type
6264
model.replace_node(node, repack_layer)
6365

6466
return True

hls4ml/backends/quartus/quartus_backend.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ def _register_flows(self):
7272
'quartus:inplace_parallel_reshape',
7373
'quartus:inplace_stream_flatten',
7474
'quartus:skip_softmax',
75+
'quartus:fix_softmax_table_size',
7576
]
7677
optimization_flow = register_flow('optimize', optimization_passes, requires=[init_flow], backend=self.name)
7778

hls4ml/backends/vivado/passes/convolution_templates.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@
4141
static const unsigned out_width = {out_width};
4242
static const unsigned reuse_factor = {reuse};
4343
static const unsigned n_zeros = {nzeros};
44+
static const unsigned multiplier_limit =
45+
DIV_ROUNDUP(kernel_size * n_chan * n_filt, reuse_factor) - n_zeros / reuse_factor;
4446
static const bool store_weights_in_bram = false;
4547
static const unsigned strategy = nnet::{strategy};
4648
static const nnet::conv_implementation implementation = nnet::conv_implementation::{implementation};

0 commit comments

Comments
 (0)