Skip to content

Commit f09afc8

Browse files
ssjhvcopybara-github
authored andcommitted
Changed m in the UnboundedIndexRangeEncode documentation to match m in the
RangeEncode op, so that the meaning of m is consistent in two different ops. In addition, clarified the overflow code. PiperOrigin-RevId: 300490583 Change-Id: I09feca4b2c4b3fe46a713bc4e4221f5a3ecea8cf
1 parent 8c79d5c commit f09afc8

File tree

1 file changed

+30
-28
lines changed

1 file changed

+30
-28
lines changed

tensorflow_compression/cc/ops/range_coding_ops.cc

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -136,41 +136,43 @@ REGISTER_OP("UnboundedIndexRangeEncode")
136136
.Doc(R"doc(
137137
Range encodes unbounded integer `data` using an indexed probability table.
138138
139-
For each value in `data`, the corresponding value in `index` determines which
140-
probability model in `cdf` is used to encode it. The data can be arbitrary
141-
signed integers, where the integer intervals determined by `offset` and
142-
`cdf_size` are modeled using the cumulative distribution functions (CDF) in
143-
`cdf`. Everything else is encoded with a variable length code.
144-
145-
The argument `cdf` is a 2-D tensor and its each row contains a CDF. The argument
146-
`cdf_size` is a 1-D tensor, and its length should be the same as the number of
147-
rows of `cdf`. The values in `cdf_size` denotes the length of CDF vector in the
148-
corresponding row of `cdf`.
149-
150-
For i = 0,1,..., let `m = cdf_size[i]`. Then for j = 0,1,...,m-1,
139+
Arguments `data` and `index` should have the same shape. `data` contains the
140+
values to be encoded. For each value in `data`, the corresponding value in
141+
`index` determines which row in `cdf` should be used to encode the value in
142+
`data`. `index` also determines which element in `offset` vector determines the
143+
integer interval the cdf applies to. Naturally, the elements of `index` should
144+
be in the half-open interval `[0, cdf.shape[0])`.
145+
146+
The argument `cdf` is a 2-D tensor and each of its rows contains a CDF. The
147+
argument `cdf_size` is a 1-D tensor, and its length should be the same as the
148+
number of rows of `cdf`. The values in `cdf_size` denote the length of CDF
149+
vector in the corresponding row of `cdf`.
150+
151+
For i = 0,1,..., let `m = cdf_size[i] - 1`, i.e., all the "regular" data values
152+
associated with `index == i` should be in the half-open interval
153+
`[offset[i], offset[i] + m)`. (More details below about regular and non-regular
154+
values.) Then
151155
152156
```
153-
cdf[..., 0] / 2^precision = Pr(X < 0) = 0
154-
cdf[..., 1] / 2^precision = Pr(X < 1) = Pr(X <= 0)
155-
cdf[..., 2] / 2^precision = Pr(X < 2) = Pr(X <= 1)
157+
cdf[..., 0] / 2^precision = Pr(0 <= X - offset[i] < 0) = 0
158+
cdf[..., 1] / 2^precision = Pr(0 <= X - offset[i] < 1)
159+
cdf[..., 2] / 2^precision = Pr(0 <= X - offset[i] < 2)
156160
...
157-
cdf[..., m-1] / 2^precision = Pr(X < m-1) = Pr(X <= m-2).
161+
cdf[..., m-1] / 2^precision = Pr(0 <= X - offset[i] < m-1).
162+
cdf[..., m] / 2^precision = 1.
158163
```
159164
160-
We require that `1 < m <= cdf.shape[1]` and that all elements of `cdf` be in the
165+
We require that `1 < m < cdf.shape[-1]` and that all elements of `cdf` be in the
161166
closed interval `[0, 2^precision]`.
162167
163-
Arguments `data` and `index` should have the same shape. `data` contains the
164-
values to be encoded. `index` denotes which row in `cdf` should be used to
165-
encode the corresponding value in `data`, and which element in `offset`
166-
determines the integer interval the cdf applies to. Naturally, the elements of
167-
`index` should be in the half-open interval `[0, cdf.shape[0])`.
168-
169-
When a value from `data` is in the interval `[offset[i], offset[i] + m - 2)`,
170-
then the value is range encoded using the CDF values. The last entry in each
171-
CDF (the one at `m - 1`) is an overflow code. When a value from `data` is
172-
outside of the given interval, the overflow value is encoded, followed by a
173-
variable-length encoding of the actual data value.
168+
Note that the last CDF entry is the probability that `X - offset[i]` is any
169+
value, including the events `X - offset[i] < 0` and `m - 1 <= X - offset[i]`.
170+
When a value from `data` is regular and is in the interval
171+
`[offset[i], offset[i] + m - 1)`, then the value minus `offset[i]` is range
172+
encoded using the CDF values. The maximum value in each CDF (`m - 1`) is an
173+
overflow code. When a value from `data` is outside of the previous interval, the
174+
overflow code is range encoded, followed by a variable-length encoding of the
175+
actual data value.
174176
175177
The encoded output contains neither the shape information of the encoded data
176178
nor a termination symbol. Therefore the shape of the encoded data must be

0 commit comments

Comments
 (0)