You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/advanced/model_optimization.rst
+10-3Lines changed: 10 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,14 @@
1
-
========================
2
-
hls4ml Optimization API
3
-
========================
1
+
=================================
2
+
Hardware-aware Optimization API
3
+
=================================
4
4
5
5
Pruning and weight sharing are effective techniques to reduce model footprint and computational requirements. The hls4ml Optimization API introduces hardware-aware pruning and weight sharing.
6
6
By defining custom objectives, the algorithm solves a Knapsack optimization problem aimed at maximizing model performance, while keeping the target resource(s) at a minimum. Out-of-the box objectives include network sparsity, GPU FLOPs, Vivado DSPs, memory utilization etc.
7
7
8
8
The code block below showcases three use cases of the hls4ml Optimization API - network sparsity (unstructured pruning), GPU FLOPs (structured pruning) and Vivado DSP utilization (pattern pruning). First, we start with unstructured pruning:
9
9
10
10
.. code-block:: Python
11
+
11
12
from sklearn.metrics import accuracy_score
12
13
from tensorflow.keras.optimizers import Adam
13
14
from tensorflow.keras.metrics import CategoricalAccuracy
@@ -71,7 +72,9 @@ In a similar manner, it is possible to target GPU FLOPs or Vivado DSPs. However,
71
72
Instead, it is the sparsity of the target resource. As an example: Starting with a network utilizing 512 DSPs and a final sparsity of 50%; the optimized network will use 256 DSPs.
72
73
73
74
To optimize GPU FLOPs, the code is similar to above:
75
+
74
76
.. code-block:: Python
77
+
75
78
from hls4ml.optimization.objectives.gpu_objectives import GPUFLOPEstimator
76
79
77
80
# Optimize model
@@ -91,7 +94,9 @@ To optimize GPU FLOPs, the code is similar to above:
91
94
print(optimized_model.summary())
92
95
93
96
Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
97
+
94
98
.. code-block:: Python
99
+
95
100
from hls4ml.utils.config import config_from_keras_model
96
101
from hls4ml.optimization.objectives.vivado_objectives import VivadoDSPEstimator
97
102
@@ -121,7 +126,9 @@ Finally, optimizing Vivado DSPs is possible, given a hls4ml config:
121
126
122
127
There are two more Vivado "optimizers" - VivadoFFEstimator, aimed at reducing register utilisation and VivadoMultiObjectiveEstimator, aimed at optimising BRAM and DSP utilisation.
123
128
Note, to ensure DSPs are optimized, "unrolled" Dense multiplication must be used before synthesing HLS, by modifying the config:
0 commit comments