Skip to content

Commit a4a0b82

Browse files
committed
Addressed reviewer's comments.
Change-Id: I357df852aa7ef2efd0a09ec556be4cb996808aec
1 parent c4107d9 commit a4a0b82

File tree

1 file changed

+36
-49
lines changed
  • tensorflow_model_optimization/g3doc/guide/pruning

1 file changed

+36
-49
lines changed

tensorflow_model_optimization/g3doc/guide/pruning/index.md

Lines changed: 36 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,18 @@ It is on our roadmap to add support in the following areas:
4343
* [Minimal Subclassed model support](https://github.com/tensorflow/model-optimization/issues/155)
4444
* [Framework support for latency improvements](https://github.com/tensorflow/model-optimization/issues/173)
4545

46+
## Structural pruning M by N
47+
48+
Structural pruning zeroes out model weights at the beginning of the training
49+
process according to the following pattern: M weights are set to zero in the
50+
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
51+
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
52+
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
53+
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
54+
55+
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
56+
provides more information on this topic.
57+
4658
## Results
4759

4860
### Image Classification
@@ -53,12 +65,14 @@ It is on our roadmap to add support in the following areas:
5365
<th>Model</th>
5466
<th>Non-sparse Top-1 Accuracy </th>
5567
<th>Sparse Accuracy </th>
68+
<th>Sparsity 2 by 4</th>
5669
<th>Sparsity </th>
5770
</tr>
5871
<tr>
5972
<td rowspan=3>InceptionV3</td>
6073
<td rowspan=3>78.1%</td>
6174
<td>78.0%</td>
75+
<td>75.8%</td>
6276
<td>50%</td>
6377
</tr>
6478
<tr>
@@ -68,7 +82,10 @@ It is on our roadmap to add support in the following areas:
6882
<td>74.6%</td><td>87.5%</td>
6983
</tr>
7084
<tr>
71-
<td>MobilenetV1 224</td><td>71.04%</td><td>70.84%</td><td>50%</td>
85+
<td>MobilenetV1 224</td><td>71.04%</td><td>70.84%</td><td>67.35%</td><td>50%</td>
86+
</tr>
87+
<tr>
88+
<td>MobilenetV2 224</td><td>71.77%</td><td>69.64%</td><td>66.75%</td><td>50%</td>
7289
</tr>
7390
</table>
7491
</figure>
@@ -115,68 +132,38 @@ The models were tested on Imagenet.
115132
The models use WMT16 German and English dataset with news-test2013 as the dev
116133
set and news-test2015 as the test set.
117134

118-
## Examples
119-
120-
In addition to the [Prune with Keras](pruning_with_keras.ipynb)
121-
tutorial, see the following examples:
122-
123-
* Train a CNN model on the MNIST handwritten digit classification task with
124-
pruning:
125-
[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/mnist/mnist_cnn.py)
126-
* Train a LSTM on the IMDB sentiment classification task with pruning:
127-
[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/imdb/imdb_lstm.py)
128-
129-
For background, see *To prune, or not to prune: exploring the efficacy of
130-
pruning for model compression* [[paper](https://arxiv.org/pdf/1710.01878.pdf)].
131-
132-
## Structural pruning M by N
133-
134-
Structural pruning zeroes out model weights at the beginning of the training
135-
process according to the following pattern: M weights are set to zero in the
136-
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
137-
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
138-
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
139-
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
135+
### Keyword spotting model
140136

141-
The table below provides some results for 2 by 4 sparsity in comparison with the magnitude based pruning with the same target sparsity 50%.
137+
DS-CNN-L is a keyword spotting model created for edge devices. It can be found
138+
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
142139

143140
<figure>
144141
<table>
145142
<tr>
146143
<th>Model</th>
147-
<th>unpruned</th>
148-
<th>2/4 sparsity </th>
149-
<th>magnitude based sparsity, 50% </th>
150-
</tr>
151-
<tr>
152-
<td>Inception-V3</td>
153-
<td>77.82</td>
154-
<td>75.8</td>
155-
<td>77.47 </td>
144+
<th>Unpruned</th>
145+
<th>Sparsity 2 by 4 </th>
146+
<th>Sparsity, 50% </th>
156147
</tr>
157148
<tr>
158149
<td>DS-CNN-L</td>
159150
<td>95.23</td>
160151
<td>94.33</td>
161152
<td>94.84</td>
162153
</tr>
163-
<tr>
164-
<td>MobileNet-V1</td>
165-
<td>70.97</td>
166-
<td>67.35</td>
167-
<td>69.46</td>
168-
</tr>
169-
<tr>
170-
<td>MobileNet-V2</td>
171-
<td>71.77</td>
172-
<td>66.75</td>
173-
<td>69.64</td>
174-
</tr>
175154
</table>
176155
</figure>
177156

178-
Note: DS-CNN-L is a keyword spotting model created for edge devices. It can be found
179-
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
157+
## Examples
180158

181-
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
182-
provides more information on this topic.
159+
In addition to the [Prune with Keras](pruning_with_keras.ipynb)
160+
tutorial, see the following examples:
161+
162+
* Train a CNN model on the MNIST handwritten digit classification task with
163+
pruning:
164+
[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/mnist/mnist_cnn.py)
165+
* Train a LSTM on the IMDB sentiment classification task with pruning:
166+
[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/sparsity/keras/imdb/imdb_lstm.py)
167+
168+
For background, see *To prune, or not to prune: exploring the efficacy of
169+
pruning for model compression* [[paper](https://arxiv.org/pdf/1710.01878.pdf)].

0 commit comments

Comments
 (0)