You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* To explore the application of pruning for on-device inference, see the
13
13
[Pruning for on-device inference with XNNPACK](pruning_for_on_device_inference.ipynb).
14
+
* To see an example of structural pruning, see the
15
+
[Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb).
14
16
15
17
## Overview
16
18
@@ -126,3 +128,55 @@ pruning:
126
128
127
129
For background, see *To prune, or not to prune: exploring the efficacy of
128
130
pruning for model compression*[[paper](https://arxiv.org/pdf/1710.01878.pdf)].
131
+
132
+
## Structural pruning M by N
133
+
134
+
Structural pruning zeroes out model weights at the beginning of the training
135
+
process according to the following pattern: M weights are set to zero in the
136
+
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
137
+
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
138
+
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
139
+
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
140
+
141
+
The table below provides some results for 2 by 4 sparsity in comparison with the magnitude based pruning with the same target sparsity 50%.
142
+
143
+
<figure>
144
+
<table>
145
+
<tr>
146
+
<th>Model</th>
147
+
<th>unpruned</th>
148
+
<th>2/4 sparsity </th>
149
+
<th>magnitude based sparsity, 50% </th>
150
+
</tr>
151
+
<tr>
152
+
<td>Inception-V3</td>
153
+
<td>77.82</td>
154
+
<td>75.8</td>
155
+
<td>77.47 </td>
156
+
</tr>
157
+
<tr>
158
+
<td>DS-CNN-L</td>
159
+
<td>95.23</td>
160
+
<td>94.33</td>
161
+
<td>94.84</td>
162
+
</tr>
163
+
<tr>
164
+
<td>MobileNet-V1</td>
165
+
<td>70.97</td>
166
+
<td>67.35</td>
167
+
<td>69.46</td>
168
+
</tr>
169
+
<tr>
170
+
<td>MobileNet-V2</td>
171
+
<td>71.77</td>
172
+
<td>66.75</td>
173
+
<td>69.64</td>
174
+
</tr>
175
+
</table>
176
+
</figure>
177
+
178
+
Note: DS-CNN-L is a keyword spotting model created for edge devices. It can be found
179
+
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
180
+
181
+
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
0 commit comments