20
20
21
21
## Example model
22
22
23
- The ` examples ` directory contains an implementation of the image compression
24
- model described in:
23
+ The [ examples directory] ( examples/ ) directory contains an implementation of the
24
+ image compression model described in:
25
25
26
26
> J. Ballé, V. Laparra, E. P. Simoncelli:
27
27
> "End-to-end optimized image compression"
@@ -43,228 +43,14 @@ python bls2017.py [options] compress original.png compressed.bin
43
43
python bls2017.py [options] decompress compressed.bin reconstruction.png
44
44
```
45
45
46
- ## Entropy bottleneck layer
46
+ ## Documentation
47
47
48
- This layer exposes a high-level interface to model the entropy (the amount of
49
- information conveyed) of the tensor passing through it. During training, this
50
- can be use to impose a (soft) entropy constraint on its activations, limiting
51
- the amount of information flowing through the layer. Note that this is distinct
52
- from other types of bottlenecks, which reduce the dimensionality of the space,
53
- for example. Dimensionality reduction does not limit the amount of information,
54
- and does not enable efficient data compression per se.
48
+ Refer to [ the API documentation] ( docs/api_docs/python/tfc.md ) for a full
49
+ description of the Keras layers and TensorFlow ops this package implements.
55
50
56
- After training, this layer can be used to compress any input tensor to a string,
57
- which may be written to a file, and to decompress a file which it previously
58
- generated back to a reconstructed tensor (possibly on a different machine having
59
- access to the same model checkpoint). For this, it uses the range coder
60
- documented in the next section. The entropies estimated during training or
61
- evaluation are approximately equal to the average length of the strings in bits.
62
-
63
- The layer implements a flexible probability density model to estimate entropy,
64
- which is described in the appendix of the paper (please cite the paper if you
65
- use this code for scientific work):
66
-
67
- > J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston:
68
- > "Variational image compression with a scale hyperprior"
69
- > https://arxiv.org/abs/1802.01436
70
-
71
- The layer assumes that the input tensor is at least 2D, with a batch dimension
72
- at the beginning and a channel dimension as specified by ` data_format ` . The
73
- layer trains an independent probability density model for each channel, but
74
- assumes that across all other dimensions, the inputs are i.i.d. (independent and
75
- identically distributed). Because the entropy (and hence, average codelength) is
76
- a function of the densities, this assumption may have a direct effect on the
77
- compression performance.
78
-
79
- Because data compression always involves discretization, the outputs of the
80
- layer are generally only approximations of its inputs. During training,
81
- discretization is modeled using additive uniform noise to ensure
82
- differentiability. The entropies computed during training are differential
83
- entropies. During evaluation, the data is actually quantized, and the
84
- entropies are discrete (Shannon entropies). To make sure the approximated
85
- tensor values are good enough for practical purposes, the training phase must
86
- be used to balance the quality of the approximation with the entropy, by
87
- adding an entropy term to the training loss, as in the following example.
88
-
89
- ### Training
90
-
91
- Here, we use the entropy bottleneck to compress the latent representation of
92
- an autoencoder. The data vectors ` x ` in this case are 4D tensors in
93
- ` 'channels_last' ` format (for example, 16x16 pixel grayscale images).
94
-
95
- Note that ` forward_transform ` and ` backward_transform ` are placeholders and can
96
- be any appropriate artifical neural network. We've found that it generally helps
97
- * not* to use batch normalization, and to sandwich the bottleneck between two
98
- linear transforms or convolutions (i.e. to have no nonlinearities directly
99
- before and after).
100
-
101
- ``` python
102
- # Build autoencoder.
103
- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
104
- y = forward_transform(x)
105
- entropy_bottleneck = EntropyBottleneck()
106
- y_, likelihoods = entropy_bottleneck(y, training = True )
107
- x_ = backward_transform(y_)
108
-
109
- # Information content (= predicted codelength) in bits of each batch element
110
- # (note that taking the natural logarithm and dividing by `log(2)` is
111
- # equivalent to taking base-2 logarithms):
112
- bits = tf.reduce_sum(tf.log(likelihoods), axis = (1 , 2 , 3 )) / - np.log(2 )
113
-
114
- # Squared difference of each batch element:
115
- squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis = (1 , 2 , 3 ))
116
-
117
- # The loss is a weighted sum of mean squared error and entropy (average
118
- # information content), where the weight controls the trade-off between
119
- # approximation error and entropy.
120
- main_loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
121
-
122
- # Minimize loss and auxiliary loss, and execute update op.
123
- main_optimizer = tf.train.AdamOptimizer(learning_rate = 1e-4 )
124
- main_step = main_optimizer.minimize(main_loss)
125
- # 1e-3 is a good starting point for the learning rate of the auxiliary loss,
126
- # assuming Adam is used.
127
- aux_optimizer = tf.train.AdamOptimizer(learning_rate = 1e-3 )
128
- aux_step = aux_optimizer.minimize(entropy_bottleneck.losses[0 ])
129
- step = tf.group(main_step, aux_step, entropy_bottleneck.updates[0 ])
130
- ```
131
-
132
- Note that the layer always produces exactly one auxiliary loss and one update
133
- op, which are only significant for compression and decompression. To use the
134
- compression feature, the auxiliary loss must be minimized during or after
135
- training. After that, the update op must be executed at least once. Here, we
136
- simply attach them to the main training step.
137
-
138
- ### Evaluation
139
-
140
- ``` python
141
- # Build autoencoder.
142
- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
143
- y = forward_transform(x)
144
- y_, likelihoods = EntropyBottleneck()(y, training = False )
145
- x_ = backward_transform(y_)
146
-
147
- # Information content (= predicted codelength) in bits of each batch element:
148
- bits = tf.reduce_sum(tf.log(likelihoods), axis = (1 , 2 , 3 )) / - np.log(2 )
149
-
150
- # Squared difference of each batch element:
151
- squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis = (1 , 2 , 3 ))
152
-
153
- # The loss is a weighted sum of mean squared error and entropy (average
154
- # information content), where the weight controls the trade-off between
155
- # approximation error and entropy.
156
- loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
157
- ```
158
-
159
- To be able to compress the bottleneck tensor and decompress it in a different
160
- session, or on a different machine, you need three items:
161
-
162
- - The compressed representations stored as strings.
163
- - The shape of the bottleneck for these string representations as a ` Tensor ` ,
164
- as well as the number of channels of the bottleneck at graph construction
165
- time.
166
- - The checkpoint of the trained model that was used for compression. Note:
167
- It is crucial that the auxiliary loss produced by this layer is minimized
168
- during or after training, and that the update op is run after training and
169
- minimization of the auxiliary loss, but * before* the checkpoint is saved.
170
-
171
- ### Compression
172
-
173
- ``` python
174
- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
175
- y = forward_transform(x)
176
- strings = EntropyBottleneck().compress(y)
177
- shape = tf.shape(y)[1 :]
178
- ```
179
-
180
- ### Decompression
181
-
182
- ``` python
183
- strings = tf.placeholder(tf.string, shape = [None ])
184
- shape = tf.placeholder(tf.int32, shape = [3 ])
185
- entropy_bottleneck = EntropyBottleneck(dtype = tf.float32)
186
- y_ = entropy_bottleneck.decompress(strings, shape, channels = 5 )
187
- x_ = backward_transform(y_)
188
- ```
189
- Here, we assumed that the tensor produced by the forward transform has 5
190
- channels.
191
-
192
- The above four use cases can also be implemented within the same session (i.e.
193
- on the same ` EntropyBottleneck ` instance), for testing purposes, etc., by
194
- calling the object more than once.
195
-
196
-
197
- ## Range encoder and decoder
198
-
199
- This package contains a range encoder and a range decoder, which can encode
200
- integer data into strings using cumulative distribution functions (CDF). It is
201
- used by the higher-level entropy bottleneck class described in the previous
202
- section.
203
-
204
- ### Data and CDF values
205
-
206
- The data to be encoded should be non-negative integers in half-open interval
207
- ` [0, m) ` . Then a CDF is represented as an integral vector of length ` m + 1 `
208
- where ` CDF(i) = f(Pr(X < i) * 2^precision) ` for i = 0,1,...,m, and ` precision `
209
- is an attribute in range ` 0 < precision <= 16 ` . The function ` f ` maps real
210
- values into integers, e.g., round or floor. It is important that to encode a
211
- number ` i ` , ` CDF(i + 1) - CDF(i) ` cannot be zero.
212
-
213
- Note that we used ` Pr(X < i) ` not ` Pr(X <= i) ` , and therefore CDF(0) = 0 always.
214
-
215
- ### RangeEncode: data shapes and CDF shapes
216
-
217
- For each data element, its CDF has to be provided. Therefore if the shape of CDF
218
- should be ` data.shape + (m + 1,) ` in NumPy-like notation. For example, if ` data `
219
- is a 2-D tensor of shape (10, 10) and its elements are in ` [0, 64) ` , then the
220
- CDF tensor should have shape (10, 10, 65).
221
-
222
- This may make CDF tensor too large, and in many applications all data elements
223
- may have the same probability distribution. To handle this, ` RangeEncode `
224
- supports limited broadcasting CDF into data. Broadcasting is limited in the
225
- following sense:
226
-
227
- - All CDF axes but the last one is broadcasted into data but not the other way
228
- around,
229
- - The number of CDF axes does not extend, i.e., ` CDF.ndim == data.ndim + 1 ` .
230
-
231
- In the previous example where data has shape (10, 10), the following are
232
- acceptable CDF shapes:
233
-
234
- - (10, 10, 65)
235
- - (1, 10, 65)
236
- - (10, 1, 65)
237
- - (1, 1, 65)
238
-
239
- ### RangeDecode
240
-
241
- ` RangeEncode ` encodes neither data shape nor termination character. Therefore
242
- the decoder should know how many characters are encoded into the string, and
243
- ` RangeDecode ` takes the encoded data shape as the second argument. The same
244
- shape restrictions as ` RangeEncode ` inputs apply here.
245
-
246
- ### Example
247
-
248
- ``` python
249
- data = tf.random_uniform((128 , 128 ), 0 , 10 , dtype = tf.int32)
250
-
251
- histogram = tf.bincount(data, minlength = 10 , maxlength = 10 )
252
- cdf = tf.cumsum(histogram, exclusive = False )
253
- # CDF should have length m + 1.
254
- cdf = tf.pad(cdf, [[1 , 0 ]])
255
- # CDF axis count must be one more than data.
256
- cdf = tf.reshape(cdf, [1 , 1 , - 1 ])
257
-
258
- # Note that data has 2^14 elements, and therefore the sum of CDF is 2^14.
259
- data = tf.cast(data, tf.int16)
260
- encoded = coder.range_encode(data, cdf, precision = 14 )
261
- decoded = coder.range_decode(encoded, tf.shape(data), cdf, precision = 14 )
262
-
263
- # data and decoded should be the same.
264
- sess = tf.Session()
265
- x, y = sess.run((data, decoded))
266
- assert np.all(x == y)
267
- ```
51
+ There's also an introduction to our ` EntropyBottleneck ` class
52
+ [ here] ( docs/entropy_bottleneck.md ) , and a description of the range coding ops
53
+ [ here] ( docs/range_coding.md ) .
268
54
269
55
## Authors
270
56
Johannes Ballé (github: [ jonycgn] ( https://github.com/jonycgn ) ),
0 commit comments