Skip to content

Commit c4107d9

Browse files
committed
Notebook and changes to the overview document on structural sparsity 2 by 4.
Change-Id: Ia9b61f00dbeb6a1ae89b152c937eab14f1a0cde0
1 parent 60a2228 commit c4107d9

File tree

2 files changed

+625
-0
lines changed

2 files changed

+625
-0
lines changed

tensorflow_model_optimization/g3doc/guide/pruning/index.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ fits with your use case.
1111
[pruning comprehensive guide](comprehensive_guide.ipynb).
1212
* To explore the application of pruning for on-device inference, see the
1313
[Pruning for on-device inference with XNNPACK](pruning_for_on_device_inference.ipynb).
14+
* To see an example of structural pruning, see the
15+
[Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb).
1416

1517
## Overview
1618

@@ -126,3 +128,55 @@ pruning:
126128

127129
For background, see *To prune, or not to prune: exploring the efficacy of
128130
pruning for model compression* [[paper](https://arxiv.org/pdf/1710.01878.pdf)].
131+
132+
## Structural pruning M by N
133+
134+
Structural pruning zeroes out model weights at the beginning of the training
135+
process according to the following pattern: M weights are set to zero in the
136+
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
137+
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
138+
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
139+
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
140+
141+
The table below provides some results for 2 by 4 sparsity in comparison with the magnitude based pruning with the same target sparsity 50%.
142+
143+
<figure>
144+
<table>
145+
<tr>
146+
<th>Model</th>
147+
<th>unpruned</th>
148+
<th>2/4 sparsity </th>
149+
<th>magnitude based sparsity, 50% </th>
150+
</tr>
151+
<tr>
152+
<td>Inception-V3</td>
153+
<td>77.82</td>
154+
<td>75.8</td>
155+
<td>77.47 </td>
156+
</tr>
157+
<tr>
158+
<td>DS-CNN-L</td>
159+
<td>95.23</td>
160+
<td>94.33</td>
161+
<td>94.84</td>
162+
</tr>
163+
<tr>
164+
<td>MobileNet-V1</td>
165+
<td>70.97</td>
166+
<td>67.35</td>
167+
<td>69.46</td>
168+
</tr>
169+
<tr>
170+
<td>MobileNet-V2</td>
171+
<td>71.77</td>
172+
<td>66.75</td>
173+
<td>69.64</td>
174+
</tr>
175+
</table>
176+
</figure>
177+
178+
Note: DS-CNN-L is a keyword spotting model created for edge devices. It can be found
179+
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
180+
181+
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
182+
provides more information on this topic.

0 commit comments

Comments
 (0)