@@ -136,41 +136,43 @@ REGISTER_OP("UnboundedIndexRangeEncode")
136
136
.Doc(R"doc(
137
137
Range encodes unbounded integer `data` using an indexed probability table.
138
138
139
- For each value in `data`, the corresponding value in `index` determines which
140
- probability model in `cdf` is used to encode it. The data can be arbitrary
141
- signed integers, where the integer intervals determined by `offset` and
142
- `cdf_size` are modeled using the cumulative distribution functions (CDF) in
143
- `cdf`. Everything else is encoded with a variable length code.
144
-
145
- The argument `cdf` is a 2-D tensor and its each row contains a CDF. The argument
146
- `cdf_size` is a 1-D tensor, and its length should be the same as the number of
147
- rows of `cdf`. The values in `cdf_size` denotes the length of CDF vector in the
148
- corresponding row of `cdf`.
149
-
150
- For i = 0,1,..., let `m = cdf_size[i]`. Then for j = 0,1,...,m-1,
139
+ Arguments `data` and `index` should have the same shape. `data` contains the
140
+ values to be encoded. For each value in `data`, the corresponding value in
141
+ `index` determines which row in `cdf` should be used to encode the value in
142
+ `data`. `index` also determines which element in `offset` vector determines the
143
+ integer interval the cdf applies to. Naturally, the elements of `index` should
144
+ be in the half-open interval `[0, cdf.shape[0])`.
145
+
146
+ The argument `cdf` is a 2-D tensor and each of its rows contains a CDF. The
147
+ argument `cdf_size` is a 1-D tensor, and its length should be the same as the
148
+ number of rows of `cdf`. The values in `cdf_size` denote the length of CDF
149
+ vector in the corresponding row of `cdf`.
150
+
151
+ For i = 0,1,..., let `m = cdf_size[i] - 1`, i.e., all the "regular" data values
152
+ associated with `index == i` should be in the half-open interval
153
+ `[offset[i], offset[i] + m)`. (More details below about regular and non-regular
154
+ values.) Then
151
155
152
156
```
153
- cdf[..., 0] / 2^precision = Pr(X < 0) = 0
154
- cdf[..., 1] / 2^precision = Pr(X < 1) = Pr(X <= 0 )
155
- cdf[..., 2] / 2^precision = Pr(X < 2) = Pr(X <= 1 )
157
+ cdf[..., 0] / 2^precision = Pr(0 <= X - offset[i] < 0) = 0
158
+ cdf[..., 1] / 2^precision = Pr(0 <= X - offset[i] < 1 )
159
+ cdf[..., 2] / 2^precision = Pr(0 <= X - offset[i] < 2 )
156
160
...
157
- cdf[..., m-1] / 2^precision = Pr(X < m-1) = Pr(X <= m-2).
161
+ cdf[..., m-1] / 2^precision = Pr(0 <= X - offset[i] < m-1).
162
+ cdf[..., m] / 2^precision = 1.
158
163
```
159
164
160
- We require that `1 < m <= cdf.shape[1]` and that all elements of `cdf` be in the
165
+ We require that `1 < m < cdf.shape[- 1]` and that all elements of `cdf` be in the
161
166
closed interval `[0, 2^precision]`.
162
167
163
- Arguments `data` and `index` should have the same shape. `data` contains the
164
- values to be encoded. `index` denotes which row in `cdf` should be used to
165
- encode the corresponding value in `data`, and which element in `offset`
166
- determines the integer interval the cdf applies to. Naturally, the elements of
167
- `index` should be in the half-open interval `[0, cdf.shape[0])`.
168
-
169
- When a value from `data` is in the interval `[offset[i], offset[i] + m - 2)`,
170
- then the value is range encoded using the CDF values. The last entry in each
171
- CDF (the one at `m - 1`) is an overflow code. When a value from `data` is
172
- outside of the given interval, the overflow value is encoded, followed by a
173
- variable-length encoding of the actual data value.
168
+ Note that the last CDF entry is the probability that `X - offset[i]` is any
169
+ value, including the events `X - offset[i] < 0` and `m - 1 <= X - offset[i]`.
170
+ When a value from `data` is regular and is in the interval
171
+ `[offset[i], offset[i] + m - 1)`, then the value minus `offset[i]` is range
172
+ encoded using the CDF values. The maximum value in each CDF (`m - 1`) is an
173
+ overflow code. When a value from `data` is outside of the previous interval, the
174
+ overflow code is range encoded, followed by a variable-length encoding of the
175
+ actual data value.
174
176
175
177
The encoded output contains neither the shape information of the encoded data
176
178
nor a termination symbol. Therefore the shape of the encoded data must be
0 commit comments