Skip to content

Commit 4eb5b39

Browse files
author
Kavya Srinet
committed
Editing the documentation for seq_decoder, and fixing typos
1 parent f3631a4 commit 4eb5b39

File tree

1 file changed

+48
-64
lines changed

1 file changed

+48
-64
lines changed

doc/design/ops/sequence_decoder.md

Lines changed: 48 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,28 @@
11
# Design: Sequence Decoder Generating LoDTensors
2-
In tasks such as machine translation and image to text,
3-
a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences.
2+
In tasks such as machine translation and visual captioning,
3+
a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences, one word at a time.
44

55
This documentation describes how to implement the sequence decoder as an operator.
66

77
## Beam Search based Decoder
8-
The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences,
9-
it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.
8+
The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences. It is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.
109

11-
In the old version of PaddlePaddle, a C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search,
12-
due to the complexity, the implementation relays on a lot of special data structures,
13-
quite trivial and hard to be customized by users.
10+
In the old version of PaddlePaddle, the C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search, due to the complexity involved, the implementation relies on a lot of special data structures that are quite trivial and hard to be customized by users.
1411

15-
There are a lot of heuristic tricks in the sequence generation tasks,
16-
so the flexibility of sequence decoder is very important to users.
12+
There are a lot of heuristic tricks in the sequence generation tasks, so the flexibility of sequence decoder is very important to users.
1713

18-
During PaddlePaddle's refactoring work,
19-
some new concept is proposed such as [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support sequence usage,
20-
and they can help to make the implementation of beam search based sequence decoder **more transparent and modular** .
14+
During the refactoring of PaddlePaddle, some new concepts are proposed such as: [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support the sequence usage, and they can also help make the implementation of beam search based sequence decoder **more transparent and modular** .
2115

22-
For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as `LoDTensors`;
16+
For example, the RNN states, candidates IDs and probabilities of beam search can be represented all as `LoDTensors`;
2317
the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated.
2418

2519
## Changing LoD's absolute offset to relative offsets
26-
The current `LoDTensor` is designed to store levels of variable-length sequences,
27-
it stores several arrays of integers each represents a level.
20+
The current `LoDTensor` is designed to store levels of variable-length sequences. It stores several arrays of integers where each represents a level.
2821

29-
The integers in each level represents the begin and end (not inclusive) offset of a sequence **in the underlying tensor**,
30-
let's call this format the **absolute-offset LoD** for clear.
22+
The integers in each level represent the begin and end (not inclusive) offset of a sequence **in the underlying tensor**,
23+
let's call this format the **absolute-offset LoD** for clarity.
3124

32-
The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows
25+
The relative-offset LoD can retrieve any sequence very quickly but fails to represent empty sequences, for example, a two-level LoD is as follows
3326
```python
3427
[[0, 3, 9]
3528
[0, 2, 3, 3, 3, 9]]
@@ -41,10 +34,9 @@ The first level tells that there are two sequences:
4134
while on the second level, there are several empty sequences that both begin and end at `3`.
4235
It is impossible to tell how many empty second-level sequences exist in the first-level sequences.
4336

44-
There are many scenarios that relay on empty sequence representation,
45-
such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.
37+
There are many scenarios that rely on empty sequence representation, for example in machine translation or visual captioning, one instance has no translation or the empty candidate set for a prefix.
4638

47-
So let's introduce another format of LoD,
39+
So let's introduce another format of LoD,
4840
it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD.
4941

5042
For example, to represent the same sequences of the above data
@@ -54,19 +46,18 @@ For example, to represent the same sequences of the above data
5446
[0, 2, 3, 3, 3, 9]]
5547
```
5648

57-
the first level represents that there are two sequences,
49+
the first level represents that there are two sequences,
5850
their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.
5951

6052
The second level is the same with the relative offset example because the lower level is a tensor.
6153
It is easy to find out the second sequence in the first-level LoD has two empty sequences.
6254

63-
The following demos are based on relative-offset LoD.
55+
The following examples are based on relative-offset LoD.
6456

6557
## Usage in a simple machine translation model
66-
Let's start from a simple machine translation model that is simplified from [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a simple blueprint of what a sequence decoder can do and how to use it.
58+
Let's start from a simple machine translation model that is simplified from the [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a blueprint of what a sequence decoder can do and how to use it.
6759

68-
The model has an encoder that learns the semantic vector from a sequence,
69-
and a decoder which uses the sequence decoder to generate new sentences.
60+
The model has an encoder that learns the semantic vector from a sequence, and a decoder which uses the sequence encoder to generate new sentences.
7061

7162
**Encoder**
7263
```python
@@ -117,7 +108,7 @@ def generate():
117108
# which means there are 2 sentences to translate
118109
# - the first sentence has 1 translation prefixes, the offsets are [0, 1)
119110
# - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
120-
# the target_word.lod is
111+
# the target_word.lod is
121112
# [[0, 1, 6]
122113
# [0, 2, 4, 7, 9 12]]
123114
# which means 2 sentences to translate, each has 1 and 5 prefixes
@@ -154,92 +145,85 @@ def generate():
154145

155146
translation_ids, translation_scores = decoder()
156147
```
157-
The `decoder.beam_search` is a operator that given the candidates and the scores of translations including the candidates,
158-
return the result of the beam search algorithm.
148+
The `decoder.beam_search` is an operator that, given the candidates and the scores of translations including the candidates,
149+
returns the result of the beam search algorithm.
159150

160-
In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes
151+
In this way, users can customize anything on the input or output of beam search, for example:
161152

162-
1. meke the correspondind elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate.
163-
2. remove some specific candidate in `selected_ids`
164-
3. get the final `translation_ids`, remove the translation sequence in it.
153+
1. Make the corresponding elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate.
154+
2. Remove some specific candidate in `selected_ids`.
155+
3. Get the final `translation_ids`, remove the translation sequence in it.
165156

166157
The implementation of sequence decoder can reuse the C++ class [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30),
167-
so the python syntax is quite similar to a [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop).
158+
so the python syntax is quite similar to that of an [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop).
168159

169-
Both of them are two-level `LoDTensors`
160+
Both of them are two-level `LoDTensors`:
170161

171-
- the first level represents `batch_size` of (source) sentences;
172-
- the second level represents the candidate ID sets for translation prefix.
162+
- The first level represents `batch_size` of (source) sentences.
163+
- The second level represents the candidate ID sets for translation prefix.
173164

174-
for example, 3 source sentences to translate, and has 2, 3, 1 candidates.
165+
For example, 3 source sentences to translate, and has 2, 3, 1 candidates.
175166

176-
Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
177-
a `lod_expand` operator is used to expand the LoD of the previous state to fit the current state.
167+
Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape, and an `lod_expand` operator is used to expand the LoD of the previous state to fit the current state.
178168

179-
For example, the previous state
169+
For example, the previous state:
180170

181171
* LoD is `[0, 1, 3][0, 2, 5, 6]`
182172
* content of tensor is `a1 a2 b1 b2 b3 c1`
183173

184-
the current state stored in `encoder_ctx_expanded`
174+
the current state is stored in `encoder_ctx_expanded`:
185175

186176
* LoD is `[0, 2, 7][0 3 5 8 9 11 11]`
187-
* the content is
177+
* the content is
188178
- a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
189179
- a2 a2
190180
- b1 b1 b1
191181
- b2
192182
- b3 b3
193183
- None (c1 has 0 candidates, so c1 is dropped)
194184

195-
Benefit from the relative offset LoD, empty candidate set can be represented naturally.
185+
The benefit from the relative offset LoD is that the empty candidate set can be represented naturally.
196186

197-
the status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor, the corresponding syntax is
187+
The status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor. The corresponding syntax is:
198188

199189
```python
200190
decoder.output(selected_ids)
201191
decoder.output(selected_generation_scores)
202192
```
203193

204-
the `selected_ids` is the candidate ids for the prefixes,
205-
it will be `Packed` by `TensorArray` to a two-level `LoDTensor`,
206-
the first level represents the source sequences,
207-
the second level represents generated sequences.
194+
The `selected_ids` are the candidate ids for the prefixes, and will be `Packed` by `TensorArray` to a two-level `LoDTensor`, where the first level represents the source sequences and the second level represents generated sequences.
208195

209-
Pack the `selected_scores` will get a `LoDTensor` that stores scores of each candidate of translations.
196+
Packing the `selected_scores` will get a `LoDTensor` that stores scores of each translation candidate.
210197

211-
Pack the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation.
198+
Packing the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation.
212199

213200
## LoD and shape changes during decoding
214201
<p align="center">
215202
<img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
216203
</p>
217204

218-
According the image above, the only phrase to change LoD is beam search.
205+
According to the image above, the only phase that changes the LoD is beam search.
219206

220207
## Beam search design
221-
The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs
208+
The beam search algorithm will be implemented as one method of the sequence decoder and has 3 inputs:
222209

223-
1. `topk_ids`, top K candidate ids for each prefix.
210+
1. `topk_ids`, the top K candidate ids for each prefix.
224211
2. `topk_scores`, the corresponding scores for `topk_ids`
225212
3. `generated_scores`, the score of the prefixes.
226213

227-
All of the are LoDTensors, so that the sequence affilication is clear.
228-
Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.
214+
All of these are LoDTensors, so that the sequence affiliation is clear. Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.
229215

230-
It will return three variables
216+
It will return three variables:
231217

232218
1. `selected_ids`, the final candidate beam search function selected for the next step.
233219
2. `selected_scores`, the scores for the candidates.
234-
3. `generated_scores`, the updated scores for each prefixes (with the new candidates appended).
220+
3. `generated_scores`, the updated scores for each prefix (with the new candidates appended).
235221

236222
## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray`
237-
The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors,
238-
and they exist in each time step,
223+
The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors that exist at each time step,
239224
so it is natural to store them in arrays.
240225

241-
Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors,
242-
the results of beam search are better to store in a `TensorArray`.
226+
Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors. It is better to store the results of beam search in a `TensorArray`.
243227

244-
The `Pack` and `UnPack` in `TensorArray` are used to package tensors in the array to a `LoDTensor` or split the `LoDTensor` to an array of tensors.
245-
It needs some extensions to support pack or unpack an array of `LoDTensors`.
228+
The `Pack` and `UnPack` in `TensorArray` are used to pack tensors in the array to an `LoDTensor` or split the `LoDTensor` to an array of tensors.
229+
It needs some extensions to support the packing or unpacking an array of `LoDTensors`.

0 commit comments

Comments
 (0)