Skip to content

Commit ce8c10e

Browse files
author
Johannes Ballé
committed
Refactored documentation.
1 parent 490eef0 commit ce8c10e

File tree

3 files changed

+228
-222
lines changed

3 files changed

+228
-222
lines changed

README.md

Lines changed: 8 additions & 222 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ done
2020

2121
## Example model
2222

23-
The `examples` directory contains an implementation of the image compression
24-
model described in:
23+
The [examples directory](examples/) directory contains an implementation of the
24+
image compression model described in:
2525

2626
> J. Ballé, V. Laparra, E. P. Simoncelli:
2727
> "End-to-end optimized image compression"
@@ -43,228 +43,14 @@ python bls2017.py [options] compress original.png compressed.bin
4343
python bls2017.py [options] decompress compressed.bin reconstruction.png
4444
```
4545

46-
## Entropy bottleneck layer
46+
## Documentation
4747

48-
This layer exposes a high-level interface to model the entropy (the amount of
49-
information conveyed) of the tensor passing through it. During training, this
50-
can be use to impose a (soft) entropy constraint on its activations, limiting
51-
the amount of information flowing through the layer. Note that this is distinct
52-
from other types of bottlenecks, which reduce the dimensionality of the space,
53-
for example. Dimensionality reduction does not limit the amount of information,
54-
and does not enable efficient data compression per se.
48+
Refer to [the API documentation](docs/api_docs/python/tfc.md) for a full
49+
description of the Keras layers and TensorFlow ops this package implements.
5550

56-
After training, this layer can be used to compress any input tensor to a string,
57-
which may be written to a file, and to decompress a file which it previously
58-
generated back to a reconstructed tensor (possibly on a different machine having
59-
access to the same model checkpoint). For this, it uses the range coder
60-
documented in the next section. The entropies estimated during training or
61-
evaluation are approximately equal to the average length of the strings in bits.
62-
63-
The layer implements a flexible probability density model to estimate entropy,
64-
which is described in the appendix of the paper (please cite the paper if you
65-
use this code for scientific work):
66-
67-
> J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston:
68-
> "Variational image compression with a scale hyperprior"
69-
> https://arxiv.org/abs/1802.01436
70-
71-
The layer assumes that the input tensor is at least 2D, with a batch dimension
72-
at the beginning and a channel dimension as specified by `data_format`. The
73-
layer trains an independent probability density model for each channel, but
74-
assumes that across all other dimensions, the inputs are i.i.d. (independent and
75-
identically distributed). Because the entropy (and hence, average codelength) is
76-
a function of the densities, this assumption may have a direct effect on the
77-
compression performance.
78-
79-
Because data compression always involves discretization, the outputs of the
80-
layer are generally only approximations of its inputs. During training,
81-
discretization is modeled using additive uniform noise to ensure
82-
differentiability. The entropies computed during training are differential
83-
entropies. During evaluation, the data is actually quantized, and the
84-
entropies are discrete (Shannon entropies). To make sure the approximated
85-
tensor values are good enough for practical purposes, the training phase must
86-
be used to balance the quality of the approximation with the entropy, by
87-
adding an entropy term to the training loss, as in the following example.
88-
89-
### Training
90-
91-
Here, we use the entropy bottleneck to compress the latent representation of
92-
an autoencoder. The data vectors `x` in this case are 4D tensors in
93-
`'channels_last'` format (for example, 16x16 pixel grayscale images).
94-
95-
Note that `forward_transform` and `backward_transform` are placeholders and can
96-
be any appropriate artifical neural network. We've found that it generally helps
97-
*not* to use batch normalization, and to sandwich the bottleneck between two
98-
linear transforms or convolutions (i.e. to have no nonlinearities directly
99-
before and after).
100-
101-
```python
102-
# Build autoencoder.
103-
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
104-
y = forward_transform(x)
105-
entropy_bottleneck = EntropyBottleneck()
106-
y_, likelihoods = entropy_bottleneck(y, training=True)
107-
x_ = backward_transform(y_)
108-
109-
# Information content (= predicted codelength) in bits of each batch element
110-
# (note that taking the natural logarithm and dividing by `log(2)` is
111-
# equivalent to taking base-2 logarithms):
112-
bits = tf.reduce_sum(tf.log(likelihoods), axis=(1, 2, 3)) / -np.log(2)
113-
114-
# Squared difference of each batch element:
115-
squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis=(1, 2, 3))
116-
117-
# The loss is a weighted sum of mean squared error and entropy (average
118-
# information content), where the weight controls the trade-off between
119-
# approximation error and entropy.
120-
main_loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
121-
122-
# Minimize loss and auxiliary loss, and execute update op.
123-
main_optimizer = tf.train.AdamOptimizer(learning_rate=1e-4)
124-
main_step = main_optimizer.minimize(main_loss)
125-
# 1e-3 is a good starting point for the learning rate of the auxiliary loss,
126-
# assuming Adam is used.
127-
aux_optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
128-
aux_step = aux_optimizer.minimize(entropy_bottleneck.losses[0])
129-
step = tf.group(main_step, aux_step, entropy_bottleneck.updates[0])
130-
```
131-
132-
Note that the layer always produces exactly one auxiliary loss and one update
133-
op, which are only significant for compression and decompression. To use the
134-
compression feature, the auxiliary loss must be minimized during or after
135-
training. After that, the update op must be executed at least once. Here, we
136-
simply attach them to the main training step.
137-
138-
### Evaluation
139-
140-
```python
141-
# Build autoencoder.
142-
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
143-
y = forward_transform(x)
144-
y_, likelihoods = EntropyBottleneck()(y, training=False)
145-
x_ = backward_transform(y_)
146-
147-
# Information content (= predicted codelength) in bits of each batch element:
148-
bits = tf.reduce_sum(tf.log(likelihoods), axis=(1, 2, 3)) / -np.log(2)
149-
150-
# Squared difference of each batch element:
151-
squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis=(1, 2, 3))
152-
153-
# The loss is a weighted sum of mean squared error and entropy (average
154-
# information content), where the weight controls the trade-off between
155-
# approximation error and entropy.
156-
loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
157-
```
158-
159-
To be able to compress the bottleneck tensor and decompress it in a different
160-
session, or on a different machine, you need three items:
161-
162-
- The compressed representations stored as strings.
163-
- The shape of the bottleneck for these string representations as a `Tensor`,
164-
as well as the number of channels of the bottleneck at graph construction
165-
time.
166-
- The checkpoint of the trained model that was used for compression. Note:
167-
It is crucial that the auxiliary loss produced by this layer is minimized
168-
during or after training, and that the update op is run after training and
169-
minimization of the auxiliary loss, but *before* the checkpoint is saved.
170-
171-
### Compression
172-
173-
```python
174-
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
175-
y = forward_transform(x)
176-
strings = EntropyBottleneck().compress(y)
177-
shape = tf.shape(y)[1:]
178-
```
179-
180-
### Decompression
181-
182-
```python
183-
strings = tf.placeholder(tf.string, shape=[None])
184-
shape = tf.placeholder(tf.int32, shape=[3])
185-
entropy_bottleneck = EntropyBottleneck(dtype=tf.float32)
186-
y_ = entropy_bottleneck.decompress(strings, shape, channels=5)
187-
x_ = backward_transform(y_)
188-
```
189-
Here, we assumed that the tensor produced by the forward transform has 5
190-
channels.
191-
192-
The above four use cases can also be implemented within the same session (i.e.
193-
on the same `EntropyBottleneck` instance), for testing purposes, etc., by
194-
calling the object more than once.
195-
196-
197-
## Range encoder and decoder
198-
199-
This package contains a range encoder and a range decoder, which can encode
200-
integer data into strings using cumulative distribution functions (CDF). It is
201-
used by the higher-level entropy bottleneck class described in the previous
202-
section.
203-
204-
### Data and CDF values
205-
206-
The data to be encoded should be non-negative integers in half-open interval
207-
`[0, m)`. Then a CDF is represented as an integral vector of length `m + 1`
208-
where `CDF(i) = f(Pr(X < i) * 2^precision)` for i = 0,1,...,m, and `precision`
209-
is an attribute in range `0 < precision <= 16`. The function `f` maps real
210-
values into integers, e.g., round or floor. It is important that to encode a
211-
number `i`, `CDF(i + 1) - CDF(i)` cannot be zero.
212-
213-
Note that we used `Pr(X < i)` not `Pr(X <= i)`, and therefore CDF(0) = 0 always.
214-
215-
### RangeEncode: data shapes and CDF shapes
216-
217-
For each data element, its CDF has to be provided. Therefore if the shape of CDF
218-
should be `data.shape + (m + 1,)` in NumPy-like notation. For example, if `data`
219-
is a 2-D tensor of shape (10, 10) and its elements are in `[0, 64)`, then the
220-
CDF tensor should have shape (10, 10, 65).
221-
222-
This may make CDF tensor too large, and in many applications all data elements
223-
may have the same probability distribution. To handle this, `RangeEncode`
224-
supports limited broadcasting CDF into data. Broadcasting is limited in the
225-
following sense:
226-
227-
- All CDF axes but the last one is broadcasted into data but not the other way
228-
around,
229-
- The number of CDF axes does not extend, i.e., `CDF.ndim == data.ndim + 1`.
230-
231-
In the previous example where data has shape (10, 10), the following are
232-
acceptable CDF shapes:
233-
234-
- (10, 10, 65)
235-
- (1, 10, 65)
236-
- (10, 1, 65)
237-
- (1, 1, 65)
238-
239-
### RangeDecode
240-
241-
`RangeEncode` encodes neither data shape nor termination character. Therefore
242-
the decoder should know how many characters are encoded into the string, and
243-
`RangeDecode` takes the encoded data shape as the second argument. The same
244-
shape restrictions as `RangeEncode` inputs apply here.
245-
246-
### Example
247-
248-
```python
249-
data = tf.random_uniform((128, 128), 0, 10, dtype=tf.int32)
250-
251-
histogram = tf.bincount(data, minlength=10, maxlength=10)
252-
cdf = tf.cumsum(histogram, exclusive=False)
253-
# CDF should have length m + 1.
254-
cdf = tf.pad(cdf, [[1, 0]])
255-
# CDF axis count must be one more than data.
256-
cdf = tf.reshape(cdf, [1, 1, -1])
257-
258-
# Note that data has 2^14 elements, and therefore the sum of CDF is 2^14.
259-
data = tf.cast(data, tf.int16)
260-
encoded = coder.range_encode(data, cdf, precision=14)
261-
decoded = coder.range_decode(encoded, tf.shape(data), cdf, precision=14)
262-
263-
# data and decoded should be the same.
264-
sess = tf.Session()
265-
x, y = sess.run((data, decoded))
266-
assert np.all(x == y)
267-
```
51+
There's also an introduction to our `EntropyBottleneck` class
52+
[here](docs/entropy_bottleneck.md), and a description of the range coding ops
53+
[here](docs/range_coding.md).
26854

26955
## Authors
27056
Johannes Ballé (github: [jonycgn](https://github.com/jonycgn)),

docs/entropy_bottleneck.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Entropy bottleneck layer
2+
3+
This layer exposes a high-level interface to model the entropy (the amount of
4+
information conveyed) of the tensor passing through it. During training, this
5+
can be use to impose a (soft) entropy constraint on its activations, limiting
6+
the amount of information flowing through the layer. Note that this is distinct
7+
from other types of bottlenecks, which reduce the dimensionality of the space,
8+
for example. Dimensionality reduction does not limit the amount of information,
9+
and does not enable efficient data compression per se.
10+
11+
After training, this layer can be used to compress any input tensor to a string,
12+
which may be written to a file, and to decompress a file which it previously
13+
generated back to a reconstructed tensor (possibly on a different machine having
14+
access to the same model checkpoint). For this, it uses the range coder
15+
documented in the next section. The entropies estimated during training or
16+
evaluation are approximately equal to the average length of the strings in bits.
17+
18+
The layer implements a flexible probability density model to estimate entropy,
19+
which is described in the appendix of the paper (please cite the paper if you
20+
use this code for scientific work):
21+
22+
> J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston:
23+
> "Variational image compression with a scale hyperprior"
24+
> https://arxiv.org/abs/1802.01436
25+
26+
The layer assumes that the input tensor is at least 2D, with a batch dimension
27+
at the beginning and a channel dimension as specified by `data_format`. The
28+
layer trains an independent probability density model for each channel, but
29+
assumes that across all other dimensions, the inputs are i.i.d. (independent and
30+
identically distributed). Because the entropy (and hence, average codelength) is
31+
a function of the densities, this assumption may have a direct effect on the
32+
compression performance.
33+
34+
Because data compression always involves discretization, the outputs of the
35+
layer are generally only approximations of its inputs. During training,
36+
discretization is modeled using additive uniform noise to ensure
37+
differentiability. The entropies computed during training are differential
38+
entropies. During evaluation, the data is actually quantized, and the
39+
entropies are discrete (Shannon entropies). To make sure the approximated
40+
tensor values are good enough for practical purposes, the training phase must
41+
be used to balance the quality of the approximation with the entropy, by
42+
adding an entropy term to the training loss, as in the following example.
43+
44+
## Training
45+
46+
Here, we use the entropy bottleneck to compress the latent representation of
47+
an autoencoder. The data vectors `x` in this case are 4D tensors in
48+
`'channels_last'` format (for example, 16x16 pixel grayscale images).
49+
50+
Note that `forward_transform` and `backward_transform` are placeholders and can
51+
be any appropriate artifical neural network. We've found that it generally helps
52+
*not* to use batch normalization, and to sandwich the bottleneck between two
53+
linear transforms or convolutions (i.e. to have no nonlinearities directly
54+
before and after).
55+
56+
```python
57+
# Build autoencoder.
58+
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
59+
y = forward_transform(x)
60+
entropy_bottleneck = EntropyBottleneck()
61+
y_, likelihoods = entropy_bottleneck(y, training=True)
62+
x_ = backward_transform(y_)
63+
64+
# Information content (= predicted codelength) in bits of each batch element
65+
# (note that taking the natural logarithm and dividing by `log(2)` is
66+
# equivalent to taking base-2 logarithms):
67+
bits = tf.reduce_sum(tf.log(likelihoods), axis=(1, 2, 3)) / -np.log(2)
68+
69+
# Squared difference of each batch element:
70+
squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis=(1, 2, 3))
71+
72+
# The loss is a weighted sum of mean squared error and entropy (average
73+
# information content), where the weight controls the trade-off between
74+
# approximation error and entropy.
75+
main_loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
76+
77+
# Minimize loss and auxiliary loss, and execute update op.
78+
main_optimizer = tf.train.AdamOptimizer(learning_rate=1e-4)
79+
main_step = main_optimizer.minimize(main_loss)
80+
# 1e-3 is a good starting point for the learning rate of the auxiliary loss,
81+
# assuming Adam is used.
82+
aux_optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
83+
aux_step = aux_optimizer.minimize(entropy_bottleneck.losses[0])
84+
step = tf.group(main_step, aux_step, entropy_bottleneck.updates[0])
85+
```
86+
87+
Note that the layer always produces exactly one auxiliary loss and one update
88+
op, which are only significant for compression and decompression. To use the
89+
compression feature, the auxiliary loss must be minimized during or after
90+
training. After that, the update op must be executed at least once. Here, we
91+
simply attach them to the main training step.
92+
93+
## Evaluation
94+
95+
```python
96+
# Build autoencoder.
97+
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
98+
y = forward_transform(x)
99+
y_, likelihoods = EntropyBottleneck()(y, training=False)
100+
x_ = backward_transform(y_)
101+
102+
# Information content (= predicted codelength) in bits of each batch element:
103+
bits = tf.reduce_sum(tf.log(likelihoods), axis=(1, 2, 3)) / -np.log(2)
104+
105+
# Squared difference of each batch element:
106+
squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis=(1, 2, 3))
107+
108+
# The loss is a weighted sum of mean squared error and entropy (average
109+
# information content), where the weight controls the trade-off between
110+
# approximation error and entropy.
111+
loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
112+
```
113+
114+
To be able to compress the bottleneck tensor and decompress it in a different
115+
session, or on a different machine, you need three items:
116+
117+
- The compressed representations stored as strings.
118+
- The shape of the bottleneck for these string representations as a `Tensor`,
119+
as well as the number of channels of the bottleneck at graph construction
120+
time.
121+
- The checkpoint of the trained model that was used for compression. Note:
122+
It is crucial that the auxiliary loss produced by this layer is minimized
123+
during or after training, and that the update op is run after training and
124+
minimization of the auxiliary loss, but *before* the checkpoint is saved.
125+
126+
## Compression
127+
128+
```python
129+
x = tf.placeholder(tf.float32, shape=[None, 16, 16, 1])
130+
y = forward_transform(x)
131+
strings = EntropyBottleneck().compress(y)
132+
shape = tf.shape(y)[1:]
133+
```
134+
135+
## Decompression
136+
137+
```python
138+
strings = tf.placeholder(tf.string, shape=[None])
139+
shape = tf.placeholder(tf.int32, shape=[3])
140+
entropy_bottleneck = EntropyBottleneck(dtype=tf.float32)
141+
y_ = entropy_bottleneck.decompress(strings, shape, channels=5)
142+
x_ = backward_transform(y_)
143+
```
144+
Here, we assumed that the tensor produced by the forward transform has 5
145+
channels.
146+
147+
The above four use cases can also be implemented within the same session (i.e.
148+
on the same `EntropyBottleneck` instance), for testing purposes, etc., by
149+
calling the object more than once.

0 commit comments

Comments
 (0)