You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tensorflow_model_optimization/g3doc/guide/pruning/index.md
+36-49Lines changed: 36 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,18 @@ It is on our roadmap to add support in the following areas:
43
43
*[Minimal Subclassed model support](https://github.com/tensorflow/model-optimization/issues/155)
44
44
*[Framework support for latency improvements](https://github.com/tensorflow/model-optimization/issues/173)
45
45
46
+
## Structural pruning M by N
47
+
48
+
Structural pruning zeroes out model weights at the beginning of the training
49
+
process according to the following pattern: M weights are set to zero in the
50
+
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
51
+
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
52
+
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
53
+
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
54
+
55
+
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
56
+
provides more information on this topic.
57
+
46
58
## Results
47
59
48
60
### Image Classification
@@ -53,12 +65,14 @@ It is on our roadmap to add support in the following areas:
53
65
<th>Model</th>
54
66
<th>Non-sparse Top-1 Accuracy </th>
55
67
<th>Sparse Accuracy </th>
68
+
<th>Sparsity 2 by 4</th>
56
69
<th>Sparsity </th>
57
70
</tr>
58
71
<tr>
59
72
<td rowspan=3>InceptionV3</td>
60
73
<td rowspan=3>78.1%</td>
61
74
<td>78.0%</td>
75
+
<td>75.8%</td>
62
76
<td>50%</td>
63
77
</tr>
64
78
<tr>
@@ -68,7 +82,10 @@ It is on our roadmap to add support in the following areas:
For background, see *To prune, or not to prune: exploring the efficacy of
130
-
pruning for model compression*[[paper](https://arxiv.org/pdf/1710.01878.pdf)].
131
-
132
-
## Structural pruning M by N
133
-
134
-
Structural pruning zeroes out model weights at the beginning of the training
135
-
process according to the following pattern: M weights are set to zero in the
136
-
block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
137
-
Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
138
-
It is important to indicate that the pattern is valid only for the model that is converted to tflite.
139
-
If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
135
+
### Keyword spotting model
140
136
141
-
The table below provides some results for 2 by 4 sparsity in comparison with the magnitude based pruning with the same target sparsity 50%.
137
+
DS-CNN-L is a keyword spotting model created for edge devices. It can be found
138
+
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
142
139
143
140
<figure>
144
141
<table>
145
142
<tr>
146
143
<th>Model</th>
147
-
<th>unpruned</th>
148
-
<th>2/4 sparsity </th>
149
-
<th>magnitude based sparsity, 50% </th>
150
-
</tr>
151
-
<tr>
152
-
<td>Inception-V3</td>
153
-
<td>77.82</td>
154
-
<td>75.8</td>
155
-
<td>77.47 </td>
144
+
<th>Unpruned</th>
145
+
<th>Sparsity 2 by 4 </th>
146
+
<th>Sparsity, 50% </th>
156
147
</tr>
157
148
<tr>
158
149
<td>DS-CNN-L</td>
159
150
<td>95.23</td>
160
151
<td>94.33</td>
161
152
<td>94.84</td>
162
153
</tr>
163
-
<tr>
164
-
<td>MobileNet-V1</td>
165
-
<td>70.97</td>
166
-
<td>67.35</td>
167
-
<td>69.46</td>
168
-
</tr>
169
-
<tr>
170
-
<td>MobileNet-V2</td>
171
-
<td>71.77</td>
172
-
<td>66.75</td>
173
-
<td>69.64</td>
174
-
</tr>
175
154
</table>
176
155
</figure>
177
156
178
-
Note: DS-CNN-L is a keyword spotting model created for edge devices. It can be found
179
-
in [Arm’s ML Examples repository](https://github.com/ARM-software/ML-examples/tree/master/tflu-kws-cortex-m).
157
+
## Examples
180
158
181
-
The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
182
-
provides more information on this topic.
159
+
In addition to the [Prune with Keras](pruning_with_keras.ipynb)
160
+
tutorial, see the following examples:
161
+
162
+
* Train a CNN model on the MNIST handwritten digit classification task with
0 commit comments