Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit a381910

Browse files
authored
Update 20201221-tfmot-compression-api.md
1. Change the API class name. (WeightCompressionAlgorithm-> WeightCompressor) 2. Change indentation. (Remove indent for Fenced Code Blocks) 3. Added more comments on `init_training_weights` method.
1 parent 3d7b32a commit a381910

File tree

1 file changed

+152
-144
lines changed

1 file changed

+152
-144
lines changed

rfcs/20201221-tfmot-compression-api.md

Lines changed: 152 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -48,100 +48,100 @@ Our API also provides guidelines for testing and benchmark. For now, we only hav
4848
### Tutorials and Examples
4949
We provide the tutorial for [SVD](https://en.wikipedia.org/wiki/Singular_value_decomposition) compression algorithm that shows how we implement the SVD algorithm using TFMOT compression API by colab. This tutorial includes:
5050

51-
* Algorithm developer side.
52-
1. The algorithm developer implementing the SVD algorithm uses the `WeightCompressionAlgorithm` class.
53-
54-
```python
55-
class SVD(algorithm.WeightCompressionAlgorithm):
56-
"""SVD compression module config."""
57-
58-
def __init__(self, params):
59-
self.params = params
60-
61-
def init_training_weights(
62-
self, pretrained_weight: tf.Tensor):
63-
"""Init function from pre-trained model case."""
64-
rank = self.params.rank
65-
66-
# Dense Layer
67-
if len(pretrained_weight.shape) == 2:
68-
u, sv = tf_svd_factorization_2d(pretrained_weight, rank)
69-
else:
70-
raise NotImplementedError('Only for dimension=2 is supported.')
71-
72-
self.add_training_weight(
73-
name='u',
74-
shape=u.shape,
75-
dtype=u.dtype,
76-
initializer=tf.keras.initializers.Constant(u))
77-
self.add_training_weight(
78-
name='sv',
79-
shape=sv.shape,
80-
dtype=sv.dtype,
81-
initializer=tf.keras.initializers.Constant(sv))
82-
83-
def project_training_weights(self, u: tf.Tensor, sv: tf.Tensor) -> tf.Tensor:
84-
return tf.matmul(u, sv)
85-
86-
def get_compressible_weights(
87-
self, original_layer: tf.keras.layers.Layer) -> List[str]:
88-
rank = self.params.rank
89-
if isinstance(original_layer, tf.keras.layers.Dense):
90-
input_dim = original_layer.kernel.shape[0]
91-
output_dim = original_layer.kernel.shape[1]
92-
if input_dim * output_dim > (input_dim + output_dim) * rank:
93-
return ['kernel']
94-
return []
95-
```
96-
97-
1. Export the model developer API for the SVD algorithm.
98-
```python
99-
class SVDParams(object):
100-
"""Define container for parameters for SVD algorithm."""
101-
102-
def __init__(self, rank):
103-
self.rank = rank
104-
105-
def optimize(to_optimize: tf.keras.Model, params: SVDParams) -> tf.keras.Model:
106-
"""Model developer API for optimizing a model."""
107-
108-
def _optimize_layer(layer):
109-
# Require layer to be built so that the SVD-factorized weights
110-
# can be initialized from the weights.
111-
if not layer.built:
112-
raise ValueError(
113-
'Applying SVD currently requires passing in a built model')
114-
115-
return algorithm.create_layer_for_training(layer, algorithm=SVD(params))
116-
117-
return tf.keras.models.clone_model(
118-
to_optimize, clone_function=_optimize_layer)
119-
```
120-
121-
* Model developer side.
122-
1. The model developer uses the SVD algorithm.
123-
```python
124-
params = SVDParams(rank=32)
125-
compressed_model = optimize(model, params)
126-
127-
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
128-
compressed_model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
129-
130-
compressed_model.fit(x_train, y_train, epochs=2)
131-
compressed_model.evaluate(x_test, y_test, verbose=2)
132-
```
133-
1. Deploys their compressed model to TFLite model
134-
```python
135-
compressed_model.save('/tmp/model_svd_compressed')
136-
137-
def tflite_convert(saved_model_path, tflite_path):
138-
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
139-
converted = converter.convert()
140-
open(tflite_path, 'wb').write(converted)
141-
142-
tflite_convert('/tmp/model_svd_compressed',
143-
'/tmp/tflite/model_svd_compressed.tflite')
144-
```
51+
#### Algorithm developer side
52+
1. The algorithm developer implementing the SVD algorithm uses the `WeightCompressor` class.
53+
54+
```python
55+
class SVD(algorithm.WeightCompressor):
56+
"""SVD compression module config."""
57+
58+
def __init__(self, params):
59+
self.params = params
60+
61+
def init_training_weights(
62+
self, pretrained_weight: tf.Tensor):
63+
"""Init function from pre-trained model case."""
64+
rank = self.params.rank
65+
66+
# Dense Layer
67+
if len(pretrained_weight.shape) == 2:
68+
u, sv = tf_svd_factorization_2d(pretrained_weight, rank)
69+
else:
70+
raise NotImplementedError('Only for dimension=2 is supported.')
71+
72+
self.add_training_weight(
73+
name='u',
74+
shape=u.shape,
75+
dtype=u.dtype,
76+
initializer=tf.keras.initializers.Constant(u))
77+
self.add_training_weight(
78+
name='sv',
79+
shape=sv.shape,
80+
dtype=sv.dtype,
81+
initializer=tf.keras.initializers.Constant(sv))
82+
83+
def project_training_weights(self, u: tf.Tensor, sv: tf.Tensor) -> tf.Tensor:
84+
return tf.matmul(u, sv)
85+
86+
def get_compressible_weights(
87+
self, original_layer: tf.keras.layers.Layer) -> List[str]:
88+
rank = self.params.rank
89+
if isinstance(original_layer, tf.keras.layers.Dense):
90+
input_dim = original_layer.kernel.shape[0]
91+
output_dim = original_layer.kernel.shape[1]
92+
if input_dim * output_dim > (input_dim + output_dim) * rank:
93+
return ['kernel']
94+
return []
95+
```
96+
97+
2. Export the model developer API for the SVD algorithm.
98+
```python
99+
class SVDParams(object):
100+
"""Define container for parameters for SVD algorithm."""
101+
102+
def __init__(self, rank):
103+
self.rank = rank
104+
105+
def optimize(to_optimize: tf.keras.Model, params: SVDParams) -> tf.keras.Model:
106+
"""Model developer API for optimizing a model."""
107+
108+
def _optimize_layer(layer):
109+
# Require layer to be built so that the SVD-factorized weights
110+
# can be initialized from the weights.
111+
if not layer.built:
112+
raise ValueError(
113+
'Applying SVD currently requires passing in a built model')
114+
115+
return algorithm.create_layer_for_training(layer, algorithm=SVD(params))
116+
117+
return tf.keras.models.clone_model(
118+
to_optimize, clone_function=_optimize_layer)
119+
```
120+
121+
#### Model developer side
122+
1. The model developer uses the SVD algorithm.
123+
```python
124+
params = SVDParams(rank=32)
125+
compressed_model = optimize(model, params)
126+
127+
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
128+
compressed_model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
129+
130+
compressed_model.fit(x_train, y_train, epochs=2)
131+
compressed_model.evaluate(x_test, y_test, verbose=2)
132+
```
133+
2. Deploys their compressed model to TFLite model
134+
```python
135+
compressed_model.save('/tmp/model_svd_compressed')
136+
137+
def tflite_convert(saved_model_path, tflite_path):
138+
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
139+
converted = converter.convert()
140+
open(tflite_path, 'wb').write(converted)
141+
142+
tflite_convert('/tmp/model_svd_compressed',
143+
'/tmp/tflite/model_svd_compressed.tflite')
144+
```
145145

146146
We also want to provide an example of well-known compression algorithms. Here’s algorithm list at least we have to provide:
147147
* [Weight clustering](https://arxiv.org/abs/1510.00149) : Most famous compression algorithm that can be used widely.
@@ -163,7 +163,7 @@ During the training phase, `project_training_weights` method is called for each
163163
Compressed model contains the `decompress_weights` function in the graph. It’s possible to call the `decompress_weights` for each inference step. To improve performance, we’ll cache the decompressed one depending on flags if we have enough space.
164164

165165
```python
166-
class WeightCompressionAlgorithm(metaclass=abc.ABCMeta):
166+
class WeightCompressor(metaclass=abc.ABCMeta):
167167
"""Interface for weight compression algorithm that acts on a per-layer basis.
168168
169169
This allows both options of either decompressing during inference or
@@ -191,7 +191,12 @@ class WeightCompressionAlgorithm(metaclass=abc.ABCMeta):
191191
@abc.abstractmethod
192192
def init_training_weights(
193193
self, pretrained_weight: tf.Tensor):
194-
"""Initialize training weights for the training model. It calls the `add_training_weight` method several times to add training weights.
194+
"""Initialize training weights for the compressible weight.
195+
196+
It calls the `add_training_weight` to add a training weight for a given
197+
`pretrained_weight`. A `pretrained_weight` can have multiple training
198+
weights. We initialize the training weights for each compressible
199+
weight by just calling this function for each.
195200
196201
Args:
197202
pretrained_weight: tf.Tensor of a pretrained weight of a layer that will
@@ -200,7 +205,11 @@ class WeightCompressionAlgorithm(metaclass=abc.ABCMeta):
200205

201206
def add_training_weight(
202207
self, *args, **kwargs):
203-
"""Add training weight for the training model. This method is called from `init_training_weights`.
208+
"""Add a training weight for the compressible weight.
209+
210+
When this method is called from the `init_training_weights`, this adds
211+
a training weights for the pretrained_weight that is the input of the
212+
`init_training_weights`.
204213
205214
Args:
206215
*args, **kwargs: args and kwargs for training_model.add_weight.
@@ -270,7 +279,7 @@ class WeightCompressionAlgorithm(metaclass=abc.ABCMeta):
270279
#### Model compression algorithm API
271280
Some compression algorithms require training weights or compressed weights that share the weights across the layer. (e.g. lookup table for weight clustering.)
272281
We decided to support layer variable wise compression algorithm API first, because... :
273-
* Most use cases can be covered by the WeightCompressionAlgorithm API.
282+
* Most use cases can be covered by the `WeightCompressor` class API.
274283
* Hard to support a sequential model: That weight across the layer should be placed somewhere outside of the sequential model.
275284

276285
### User Impact
@@ -297,7 +306,7 @@ We’ll provide examples of compression algorithms using the API in this design,
297306
This API is a standalone project that only depends on tensorflow.
298307

299308
### Engineering Impact
300-
TF-MOT team will maintain this API code. For the initial release, we publicize the WeightCompressionAlgorithm class that the algorithm developers have to inherit this class to implement their own compression algorithm, WrapperLayer methods to access original layer, And model clone based default converter functions for model developer to help them implement their own algorithm specific APIs.
309+
TF-MOT team will maintain this API code. For the initial release, we publicize the `WeightCompressor` class that the algorithm developers have to inherit this class to implement their own compression algorithm, WrapperLayer methods to access original layer, And model clone based default converter functions for model developer to help them implement their own algorithm specific APIs.
301310

302311
### Platforms and Environments
303312
For initial release, we’ve targeted the TF 2.0 Keras model. After compressing the model, the compressed model can deploy to servers as TF model, mobile/embedded environments as TFLite model, and web as tf.js format.
@@ -313,9 +322,9 @@ Compressed models can be converted to TF model, TFLite model, and tf.js format.
313322
This is an API design doc. Engineering details will be determined in the future.
314323
For better explanation of this API, Here's the step-by-step usage documentation below:
315324

316-
### Step-by-step usage documentation of the WeightCompressionAlgorithm class methods.
325+
### Step-by-step usage documentation of the `WeightCompressor` class methods.
317326

318-
The WeightCompressionAlgorithm class has 5 abstract methods. Following explanation shows when these methods are called and used.
327+
The `WeightCompressor` class has 5 abstract methods. Following explanation shows when these methods are called and used.
319328

320329
#### User facing API Template
321330

@@ -332,7 +341,7 @@ def optimize_training(to_optimize: tf.keras.Model, params: CustomParams) -> tf.k
332341
'Applying compression currently requires passing in a built model')
333342

334343
return algorithm.create_layer_for_training(
335-
layer, algorithm=CustomAlgorithm(params))
344+
layer, algorithm=CustomWeightCompressor(params))
336345

337346
return tf.keras.models.clone_model(
338347
to_optimize, clone_function=_optimize_layer)
@@ -347,7 +356,7 @@ def optimize_inference(to_optimize: tf.keras.Model, params: CustomParams) -> tf.
347356
'Applying compression currently requires passing in a built model')
348357

349358
return algorithm.create_layer_for_inference(
350-
layer, algorithm=CustomAlgorithm(params))
359+
layer, algorithm=CustomWeightCompressor(params))
351360

352361
return tf.keras.models.clone_model(
353362
to_optimize, clone_function=_optimize_layer)
@@ -375,60 +384,60 @@ compressed_model.evaluate(x_test, y_test, verbose=2)
375384
Now we'll explain when each method is called and how many that method called for the model developer code before.
376385

377386
1. `get_compressible_weights`
378-
<p align="center">
379-
<img src=20201221-tfmot-compression-api/get_compressible_weights.png />
380-
</p>
387+
<p align="center">
388+
<img src=20201221-tfmot-compression-api/get_compressible_weights.png />
389+
</p>
381390

382-
```python
383-
training_model = optimize_training(model, params)
384-
```
391+
```python
392+
training_model = optimize_training(model, params)
393+
```
385394

386-
`get_compressible_weights` is called when we want to get a list of variables that we will apply compression.
387-
When we try to compress the pre-trained model, we just call this method for each layer in the pre-trained model. The number of the method calling is (# of layers).
395+
`get_compressible_weights` is called when we want to get a list of variables that we will apply compression.
396+
When we try to compress the pre-trained model, we just call this method for each layer in the pre-trained model. The number of the method calling is (# of layers).
388397

389-
1. `init_training_weights`
390-
<p align="center">
391-
<img src=20201221-tfmot-compression-api/init_training_weights.png />
392-
</p>
398+
2. `init_training_weights`
399+
<p align="center">
400+
<img src=20201221-tfmot-compression-api/init_training_weights.png />
401+
</p>
393402

394-
```python
395-
training_model = optimize_training(model, params)
396-
```
403+
```python
404+
training_model = optimize_training(model, params)
405+
```
397406

398-
`init_training_weights` is called when we initialize the cloned training model from the pre-trained model. `optimize_training` method basically clones the model to create a training model for compression, wrapping compressible layers by the training wrapper to create training weights. The number of the method calling is (# of compressible weights).
407+
`init_training_weights` is called when we initialize the cloned training model from the pre-trained model. `optimize_training` method basically clones the model to create a training model for compression, wrapping compressible layers by the training wrapper to create training weights. The number of the method calling is (# of compressible weights).
399408

400-
1. `project_training_weights`
401-
<p align="center">
402-
<img src=20201221-tfmot-compression-api/project_training_weights.png />
403-
</p>
409+
3. `project_training_weights`
410+
<p align="center">
411+
<img src=20201221-tfmot-compression-api/project_training_weights.png />
412+
</p>
404413

405-
```python
406-
training_model.fit(x_train, y_train, epochs=2)
407-
```
414+
```python
415+
training_model.fit(x_train, y_train, epochs=2)
416+
```
408417

409-
`project_training_weights` is called when the training model for the compression algorithm is training. Usually this method function is a part of the training model. It recovers the original weight from the training weights, and should be differentiable. This method enables you to use the original graph to compute the model output, but train the training weights of the training model. For each training step, this method is called for every compressible weight. The number of the method calling is (# of compressible weights) * (training steps).
418+
`project_training_weights` is called when the training model for the compression algorithm is training. Usually this method function is a part of the training model. It recovers the original weight from the training weights, and should be differentiable. This method enables you to use the original graph to compute the model output, but train the training weights of the training model. For each training step, this method is called for every compressible weight. The number of the method calling is (# of compressible weights) * (training steps).
410419

411-
1. `compress_training_weights`
412-
<p align="center">
413-
<img src=20201221-tfmot-compression-api/compress_training_weights.png />
414-
</p>
420+
4. `compress_training_weights`
421+
<p align="center">
422+
<img src=20201221-tfmot-compression-api/compress_training_weights.png />
423+
</p>
415424

416-
```python
417-
compressed_model = optimize_inference(training_model, params)
418-
```
425+
```python
426+
compressed_model = optimize_inference(training_model, params)
427+
```
419428

420-
`compress_training_weights` is called when we convert the training model to the compressed model. The number of the method calling is (# of compressible weights).
429+
`compress_training_weights` is called when we convert the training model to the compressed model. The number of the method calling is (# of compressible weights).
421430

422-
1. `decompress_weights`
423-
<p align="center">
424-
<img src=20201221-tfmot-compression-api/decompress_weights.png />
425-
</p>
431+
5. `decompress_weights`
432+
<p align="center">
433+
<img src=20201221-tfmot-compression-api/decompress_weights.png />
434+
</p>
426435

427-
```python
428-
compressed_model.evaluate(x_test, y_test, verbose=2)
429-
```
436+
```python
437+
compressed_model.evaluate(x_test, y_test, verbose=2)
438+
```
430439

431-
`decompress_weights` is called when we do inference on a compressed model. Usually this method function is a part of a compressed model. This method decompresses the weight that can be used on the original graph for each compressible weight. Basically the number of this method called is (# of compressible weights) * (# of inference). To improve performance, the output value of this method can be cached.
440+
`decompress_weights` is called when we do inference on a compressed model. Usually this method function is a part of a compressed model. This method decompresses the weight that can be used on the original graph for each compressible weight. Basically the number of this method called is (# of compressible weights) * (# of inference). To improve performance, the output value of this method can be cached.
432441

433442
## Questions and Discussion Topics
434443

@@ -444,4 +453,3 @@ Note that every trainable variable that they want to train should be in training
444453
### Error message & Debugging tools.
445454

446455
It's not easy to find the bug there. Usually we get tensorflow bug messages with huge stack traces. We have to provide some bug messages for this API layer.
447-

0 commit comments

Comments
 (0)