Skip to content

Commit 6f88426

Browse files
committed
Update clustering overview document to address reviewers' comments
1 parent 69dc8eb commit 6f88426

File tree

1 file changed

+11
-14
lines changed
  • tensorflow_model_optimization/g3doc/guide/clustering

1 file changed

+11
-14
lines changed

tensorflow_model_optimization/g3doc/guide/clustering/index.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
# Weight clustering
22

3-
This document provides an overview on weight clustering to help you determine how it fits with your use case. To dive right into the code, see the [weight clustering end-to-end example](clustering_example.ipynb) and the [API docs](../../api_docs/python). For additional details on how to use the Keras API, a deep dive into weight clustering, and documentation on more advanced usage patterns, see the [weight clustering comprehensive guide](clustering_comprehensive_guide.ipynb).
3+
This document provides an overview on weight clustering to help you determine how it fits with your use case.
4+
5+
- To dive right into an end-to-end example, see the [weight clustering example](clustering_example.ipynb).
6+
- To quickly find the APIs you need for your use case, see the [weight clustering comprehensive guide](clustering_comprehensive_guide.ipynb).
47

58
## Overview
69

710
Clustering, or weight sharing, reduces the number of unique weight values in a model, leading to benefits for deployment. It first groups the weights of each layer into *N* clusters, then shares the cluster's centroid value for all the weights belonging to the cluster.
811

9-
This technique brings improvements in terms of model compression. By reducing the number of unique weight values, weigth clustering renders the weights suitable for compression via Huffman coding and similar techniques. Future framework support will, therefore, be able to provide memory bandwith improvements. This can be critical for deploying deep learning models on embedded systems with limited resources.
12+
This technique brings improvements via model compression. Future framework support can unlock memory footprint improvements that can make a crucial difference for deploying deep learning models on embedded systems with limited resources.
1013

11-
We have seen up to 5x improvements in model compression with minimal loss of accuracy, as demonstrated by the [results](#results) presented below. The compression gains depend on the model and the accuracy targets in each specific use case. For example, for the MobileNetV2 image classification model, one can choose to reduce all non-depthwise convolutional layers to use just 32 unique weigth values and obtain a float32 tflite model that is approximately 4.8 times more compressible using ZIP Deflate algorithm than the original model. However, that will result in about 3% drop of the top-1 classification accuracy. On the other hand, the same model clustered less agressively, using 256 clusters for two internal layers and 32 clusters for the final convolutional layer, maintains virtually the same accuracy as the original model, yet still yields a respectable 1.8x improvement in compression ratio.
14+
We have experimented with clustering across vision and speech tasks. We've seen up to 5x improvements in model compression with minimal loss of accuracy, as demonstrated by the [results](#results) presented below.
1215

13-
Clustering works well with TFLiteConverter, providing an easy path to produce deployment-ready models that can be easily compressed using either an off-the-shelf compression algorithm, similar to the ZIP Deflate we use for demonstration in this document, or a custom method optimized for a special target hardware. When converting the clustered model with TFLiteConverter, the actual number of unique weight values per tensor may increase. This happens for the models with batch normalization layers that are folded into the preceding convolutional layers during the conversion, and also due to different scale factors in the per-channel weight quantization scheme. Both techniques may alter the same weight value differently, depending on the channel it appears in and the associated batch-normalization and quantization parameters. While this side effect may result in a slightly lower compression ratio, the overall benefits of using clustering and post-training conversion and quantization are still tangible, as demonstrated by the examples in this document.
16+
Please note that clustering will provide reduced benefits for convolution and dense layers that precede a batch normalization layer, as well as in combination with per-axis post-training quantization.
1417

1518
### API compatibility matrix
1619

@@ -78,22 +81,16 @@ The models were trained and tested on ImageNet.
7881

7982
The models were trained and tested on SpeechCommands v0.02.
8083

81-
NOTE: *Size of compressed .tflite* refers to the size of the zipped .tflite file obtained from the model through the following process:
84+
NOTE: *Size of compressed .tflite* refers to the size of the zipped .tflite file obtained from the model from the following process:
8285
1. Serialize the Keras model into .h5 file
8386
2. Convert the .h5 file into .tflite using `TFLiteConverter.from_keras_model_file()`
8487
3. Compress the .tflite file into a zip
8588

8689
## Examples
8790

88-
In addition to the [Clustering with Keras](clustering_with_keras.ipynb) tutorial, see the following examples:
91+
In addition to the [Weight clustering in Keras example](clustering_example.ipynb.ipynb), see the following examples:
8992

90-
* Cluster the weights of a CNN model trained on the MNIST handwritten digit classification databaset:
93+
* Cluster the weights of a CNN model trained on the MNIST handwritten digit classification dataset:
9194
[code](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/examples/clustering/keras/mnist/mnist_cnn.py)
9295

93-
## References
94-
95-
The weight clustering implementation is based on the technique described in chapter 3, titled *Trained Quantization and Weight Sharing*, of the conference paper referenced below.
96-
97-
1. **Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding** <br/>
98-
Song Han, Huizi Mao, William J. Dally <br/>
99-
[https://arxiv.org/abs/1510.00149](https://arxiv.org/abs/1510.00149). ICLR, 2016 <br/>
96+
The weight clustering implementation is based on the *Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization and Huffman Coding* [paper](https://arxiv.org/abs/1510.00149). See chapter 3, titled *Trained Quantization and Weight Sharing*.

0 commit comments

Comments
 (0)