Skip to content

Commit 56dd7f0

Browse files
authored
Merge pull request #11837 from panyx0718/fix
cherry-pick: Merge pull request #11712 from kexinzhao/fix_lod_name
2 parents 691b27c + 0449570 commit 56dd7f0

16 files changed

+236
-154
lines changed

doc/fluid/design/concepts/lod_tensor.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ are transformed into offsets of elements/words as follows:
173173

174174
## Slicing of LoD Tensors
175175

176+
176177
When we use the above 2-level LoD Tensor as the input to a nested-RNN, we need to retrieve certain sequences. Here we define the sequence identified by branch <i,j,...> as the **<i,j,...>-slice**.
177178

178179
For example, the <2>-slice of above example is
@@ -189,3 +190,22 @@ and the <2,0>-slice of above slice is
189190
10 12
190191
||
191192
```
193+
194+
## Length Representation vs Offset Representation
195+
196+
The offset representation is an implementation-oriented decision and it makes understanding the idea behind LoDTensor difficult.
197+
Hence, we encapsulate this implementation detail in C++ and expose the original length representation in our Python API.
198+
Specifically, we call this length representation `recursive_sequence_lengths` and users can use the following code to set or get the `recursive_sequence_lengths` of a LoDTensor in Python:
199+
```Python
200+
# length representation of lod called recursive_sequence_lengths
201+
recursive_seq_lens = [[3, 1, 2], [2, 2, 1, 3, 1, 2]]
202+
# Create a LoDTensor that has the above recursive_sequence_lengths info.
203+
# This recursive_sequence_lengths will be converted to an offset representation of LoD in the C++ implementation under the hood.
204+
tensor = fluid.LoDTensor(lod)
205+
206+
# Set/Change the recursive_sequence_lengths info of LoDTensor
207+
tensor.set_recursive_sequence_lengths([[3, 1, 2]])
208+
# Get the recursive_sequence_lengths info of a LoDTensor (the offset-based LoD representation stored in C++ will be converted
209+
# back to length-based recursive_sequence_lengths), new_recursive_seq_lens = [[3, 1, 2]]
210+
new_recursive_seq_lens = tensor.recursive_sequence_lengths()
211+
```

python/paddle/fluid/lod_tensor.py

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,16 @@
1818
__all__ = ['create_lod_tensor', 'create_random_int_lodtensor']
1919

2020

21-
def create_lod_tensor(data, lod, place):
21+
def create_lod_tensor(data, recursive_seq_lens, place):
2222
"""
2323
Create a lod tensor from a numpy array, a list, or an existing lod tensor.
2424
2525
Create a lod tensor by doing the following:
2626
27-
1. Check that the length-based input lod is valid.
27+
1. Check that the length-based level of detail (LoD) also known as
28+
recursive_sequence_lengths of the input is valid.
2829
29-
2. Convert the length-based lod to a offset-based LoD.
30+
2. Convert recursive_sequence_lengths to a offset-based LoD.
3031
3132
3. Copy the data from a numpy array, a list or a existing lod tensor to
3233
CPU or GPU device (based on input place).
@@ -37,45 +38,47 @@ def create_lod_tensor(data, lod, place):
3738
3839
Suppose we want LoDTensor to hold data for sequences of word, where each
3940
word is represented by an integer. If we want to create a LoDTensor to
40-
represent two sentences, one of 2 words, and one of 3 words.
41+
represent two sentences, one of 2 words, and one of 3 words.
4142
4243
Then :code:`data` can be a numpy array of integers with shape (5, 1).
43-
:code:`lod` will be [[2, 3]], indicating the length(# of words) in each
44-
sentence. This length-based input lod [[2, 3]] will be converted to
45-
offset-based lod [[0, 2, 5]] inside the function call.
44+
:code:`recursive_seq_lens` will be [[2, 3]], indicating the length(# of words) in each
45+
sentence. This length-based :code:`recursive_seq_lens` [[2, 3]] will be converted to
46+
offset-based LoD [[0, 2, 5]] inside the function call.
4647
4748
Please reference :ref:`api_guide_low_level_lod_tensor` for more details
4849
regarding LoD.
4950
5051
Args:
5152
data(numpy.ndarray|list|LoDTensor): a numpy array or a LoDTensor or a
52-
list holding the data to be copied.
53-
lod(list): a list of lists indicating the length-based LoD info
54-
specified by the user.
53+
list holding the data to be copied.
54+
recursive_seq_lens(list): a list of lists indicating the length-based level of detail
55+
info specified by the user.
5556
place(Place): CPU or GPU place indicating where the data in the new
5657
LoDTensor will be stored.
5758
5859
Returns:
59-
A fluid LoDTensor object with tensor data and lod info.
60+
A fluid LoDTensor object with tensor data and recursive_seq_lens info.
6061
"""
6162
if isinstance(data, core.LoDTensor):
62-
return create_lod_tensor(np.array(data), lod, place)
63+
return create_lod_tensor(np.array(data), recursive_seq_lens, place)
6364
elif isinstance(data, list):
6465
# When input data is a list, it only deal with the case where the base element
6566
# is an index of shape [1] and dtype int64 (e.g., word id). Hence, the generated
6667
# LoDTensor will be of shape [n, 1] and dtype int64, where `n` is the total number
6768
# of words or other indexes in the sequence.
68-
new_lod = []
69+
new_recursive_seq_lens = []
6970
for seq in data:
70-
new_lod.append(len(seq))
71-
assert [new_lod] == lod, "data and lod do not match"
71+
new_recursive_seq_lens.append(len(seq))
72+
assert [
73+
new_recursive_seq_lens
74+
] == recursive_seq_lens, "data and recursive_seq_lens do not match"
7275
flattened_data = np.concatenate(data, axis=0).astype("int64")
7376
flattened_data = flattened_data.reshape([len(flattened_data), 1])
74-
return create_lod_tensor(flattened_data, lod, place)
77+
return create_lod_tensor(flattened_data, recursive_seq_lens, place)
7578
elif isinstance(data, np.ndarray):
7679
tensor = core.LoDTensor()
7780
tensor.set(data, place)
78-
tensor.set_recursive_sequence_lengths(lod)
81+
tensor.set_recursive_sequence_lengths(recursive_seq_lens)
7982
assert tensor.has_valid_recursive_sequence_lengths(
8083
), "the provided lod info is invalid"
8184
return tensor
@@ -84,7 +87,8 @@ def create_lod_tensor(data, lod, place):
8487
"data should be either a LoDTensor, a Numpy array or a list")
8588

8689

87-
def create_random_int_lodtensor(lod, base_shape, place, low, high):
90+
def create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low,
91+
high):
8892
"""
8993
Create a LoDTensor containing random integers.
9094
@@ -95,7 +99,7 @@ def create_random_int_lodtensor(lod, base_shape, place, low, high):
9599
The function does the following:
96100
97101
1. Calculate the overall shape of the LoDTensor based on the length-based
98-
:code:`lod` input and the shape of the basic element in
102+
:code:`recursive_seq_lens` input and the shape of the basic element in
99103
:code:`base_shape`.
100104
101105
2. Create a numpy array of this shape.
@@ -105,12 +109,13 @@ def create_random_int_lodtensor(lod, base_shape, place, low, high):
105109
Suppose we want LoDTensor to hold data for sequences of word, where each
106110
word is represented by an integer. If we want to create a LoDTensor to
107111
represent two sentences, one of 2 words, and one of 3 words. Then
108-
'base_shape' is [1], input length-based 'lod' is [[2, 3]]. Then the overall
109-
shape of the LoDTensor would be [5, 1], holding 5 words for two sentences.
112+
'base_shape' is [1], input length-based 'recursive_seq_lens' is [[2, 3]].
113+
Then the overall shape of the LoDTensor would be [5, 1], holding 5 words
114+
for two sentences.
110115
111116
Args:
112-
lod(list): a list of lists indicating the length-based LoD info
113-
specified by the user.
117+
recursive_seq_lens(list): a list of lists indicating the length-based
118+
level of detail info specified by the user.
114119
base_shape(list): the shape of the basic element to be held by the
115120
LoDTensor.
116121
place(Place): CPU or GPU place indicating where the data in the new
@@ -119,11 +124,11 @@ def create_random_int_lodtensor(lod, base_shape, place, low, high):
119124
high(int): the upper bound of the random integers.
120125
121126
Returns:
122-
A fluid LoDTensor object with tensor data and lod info.
127+
A fluid LoDTensor object with tensor data and recursive_seq_lens info.
123128
"""
124129
assert isinstance(base_shape, list), "base_shape should be a list"
125130
# append the total number of basic elements to the front of its shape
126-
overall_shape = [sum(lod[-1])] + base_shape
131+
overall_shape = [sum(recursive_seq_lens[-1])] + base_shape
127132
# the range of integer data elements is [low, high]
128133
data = np.random.random_integers(low, high, overall_shape).astype("int64")
129-
return create_lod_tensor(data, lod, place)
134+
return create_lod_tensor(data, recursive_seq_lens, place)

python/paddle/fluid/tests/book/high-level-api/label_semantic_roles/test_label_semantic_roles_newapi.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -206,35 +206,35 @@ def infer(use_cuda, inference_program, params_dirname):
206206
inferencer = fluid.Inferencer(
207207
inference_program, param_path=params_dirname, place=place)
208208

209-
# Setup inputs by creating LoDTensors to represent sequences of words.
210-
# Here each word is the basic element of these LoDTensors and the shape of
209+
# Setup input by creating LoDTensor to represent sequence of words.
210+
# Here each word is the basic element of the LoDTensor and the shape of
211211
# each word (base_shape) should be [1] since it is simply an index to
212212
# look up for the corresponding word vector.
213-
# Suppose the length_based level of detail (lod) info is set to [[3, 4, 2]],
214-
# which has only one lod level. Then the created LoDTensors will have only
213+
# Suppose the recursive_sequence_lengths info is set to [[3, 4, 2]],
214+
# which has only one level of detail. Then the created LoDTensor will have only
215215
# one higher level structure (sequence of words, or sentence) than the basic
216216
# element (word). Hence the LoDTensor will hold data for three sentences of
217217
# length 3, 4 and 2, respectively.
218-
# Note that lod info should be a list of lists.
219-
lod = [[3, 4, 2]]
218+
# Note that recursive_sequence_lengths should be a list of lists.
219+
recursive_seq_lens = [[3, 4, 2]]
220220
base_shape = [1]
221221
# The range of random integers is [low, high]
222222
word = fluid.create_random_int_lodtensor(
223-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
223+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
224224
ctx_n2 = fluid.create_random_int_lodtensor(
225-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
225+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
226226
ctx_n1 = fluid.create_random_int_lodtensor(
227-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
227+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
228228
ctx_0 = fluid.create_random_int_lodtensor(
229-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
229+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
230230
ctx_p1 = fluid.create_random_int_lodtensor(
231-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
231+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
232232
ctx_p2 = fluid.create_random_int_lodtensor(
233-
lod, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
233+
recursive_seq_lens, base_shape, place, low=0, high=WORD_DICT_LEN - 1)
234234
pred = fluid.create_random_int_lodtensor(
235-
lod, base_shape, place, low=0, high=PRED_DICT_LEN - 1)
235+
recursive_seq_lens, base_shape, place, low=0, high=PRED_DICT_LEN - 1)
236236
mark = fluid.create_random_int_lodtensor(
237-
lod, base_shape, place, low=0, high=MARK_DICT_LEN - 1)
237+
recursive_seq_lens, base_shape, place, low=0, high=MARK_DICT_LEN - 1)
238238

239239
results = inferencer.infer(
240240
{

python/paddle/fluid/tests/book/high-level-api/machine_translation/test_machine_translation.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -215,11 +215,13 @@ def decode_main(use_cuda, is_sparse):
215215
[1. for _ in range(batch_size)], dtype='float32')
216216
init_ids_data = init_ids_data.reshape((batch_size, 1))
217217
init_scores_data = init_scores_data.reshape((batch_size, 1))
218-
init_lod = [1] * batch_size
219-
init_lod = [init_lod, init_lod]
218+
init_recursive_seq_lens = [1] * batch_size
219+
init_recursive_seq_lens = [init_recursive_seq_lens, init_recursive_seq_lens]
220220

221-
init_ids = fluid.create_lod_tensor(init_ids_data, init_lod, place)
222-
init_scores = fluid.create_lod_tensor(init_scores_data, init_lod, place)
221+
init_ids = fluid.create_lod_tensor(init_ids_data, init_recursive_seq_lens,
222+
place)
223+
init_scores = fluid.create_lod_tensor(init_scores_data,
224+
init_recursive_seq_lens, place)
223225

224226
train_data = paddle.batch(
225227
paddle.reader.shuffle(
@@ -243,7 +245,7 @@ def decode_main(use_cuda, is_sparse):
243245
feed=feed_dict,
244246
fetch_list=[translation_ids, translation_scores],
245247
return_numpy=False)
246-
print result_ids.lod()
248+
print result_ids.recursive_sequence_lengths()
247249
break
248250

249251

python/paddle/fluid/tests/book/high-level-api/recommender_system/test_recommender_system_newapi.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -209,13 +209,15 @@ def infer(use_cuda, inference_program, params_dirname):
209209
inference_program, param_path=params_dirname, place=place)
210210

211211
# Use the first data from paddle.dataset.movielens.test() as input.
212-
# Use create_lod_tensor(data, lod, place) API to generate LoD Tensor,
213-
# where `data` is a list of sequences of index numbers, `lod` is
214-
# the level of detail (lod) info associated with `data`.
212+
# Use create_lod_tensor(data, recursive_sequence_lengths, place) API
213+
# to generate LoD Tensor where `data` is a list of sequences of index
214+
# numbers, `recursive_sequence_lengths` is the length-based level of detail
215+
# (lod) info associated with `data`.
215216
# For example, data = [[10, 2, 3], [2, 3]] means that it contains
216217
# two sequences of indexes, of length 3 and 2, respectively.
217-
# Correspondingly, lod = [[3, 2]] contains one level of detail info,
218-
# indicating that `data` consists of two sequences of length 3 and 2.
218+
# Correspondingly, recursive_sequence_lengths = [[3, 2]] contains one
219+
# level of detail info, indicating that `data` consists of two sequences
220+
# of length 3 and 2, respectively.
219221
user_id = fluid.create_lod_tensor([[1]], [[1]], place)
220222
gender_id = fluid.create_lod_tensor([[1]], [[1]], place)
221223
age_id = fluid.create_lod_tensor([[0]], [[1]], place)

python/paddle/fluid/tests/book/high-level-api/understand_sentiment/test_understand_sentiment_conv.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -128,17 +128,17 @@ def infer(use_cuda, inference_program, params_dirname=None):
128128
# Here each word is the basic element of the LoDTensor and the shape of
129129
# each word (base_shape) should be [1] since it is simply an index to
130130
# look up for the corresponding word vector.
131-
# Suppose the length_based level of detail (lod) info is set to [[3, 4, 2]],
132-
# which has only one lod level. Then the created LoDTensor will have only
131+
# Suppose the recursive_sequence_lengths info is set to [[3, 4, 2]],
132+
# which has only one level of detail. Then the created LoDTensor will have only
133133
# one higher level structure (sequence of words, or sentence) than the basic
134134
# element (word). Hence the LoDTensor will hold data for three sentences of
135135
# length 3, 4 and 2, respectively.
136-
# Note that lod info should be a list of lists.
137-
lod = [[3, 4, 2]]
136+
# Note that recursive_sequence_lengths should be a list of lists.
137+
recursive_seq_lens = [[3, 4, 2]]
138138
base_shape = [1]
139139
# The range of random integers is [low, high]
140140
tensor_words = fluid.create_random_int_lodtensor(
141-
lod, base_shape, place, low=0, high=len(word_dict) - 1)
141+
recursive_seq_lens, base_shape, place, low=0, high=len(word_dict) - 1)
142142
results = inferencer.infer({'words': tensor_words})
143143
print("infer results: ", results)
144144

python/paddle/fluid/tests/book/high-level-api/understand_sentiment/test_understand_sentiment_dynamic_rnn.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -143,17 +143,17 @@ def infer(use_cuda, inference_program, params_dirname=None):
143143
# Here each word is the basic element of the LoDTensor and the shape of
144144
# each word (base_shape) should be [1] since it is simply an index to
145145
# look up for the corresponding word vector.
146-
# Suppose the length_based level of detail (lod) info is set to [[3, 4, 2]],
147-
# which has only one lod level. Then the created LoDTensor will have only
146+
# Suppose the recursive_sequence_lengths info is set to [[3, 4, 2]],
147+
# which has only one level of detail. Then the created LoDTensor will have only
148148
# one higher level structure (sequence of words, or sentence) than the basic
149149
# element (word). Hence the LoDTensor will hold data for three sentences of
150150
# length 3, 4 and 2, respectively.
151-
# Note that lod info should be a list of lists.
152-
lod = [[3, 4, 2]]
151+
# Note that recursive_sequence_lengths should be a list of lists.
152+
recursive_seq_lens = [[3, 4, 2]]
153153
base_shape = [1]
154154
# The range of random integers is [low, high]
155155
tensor_words = fluid.create_random_int_lodtensor(
156-
lod, base_shape, place, low=0, high=len(word_dict) - 1)
156+
recursive_seq_lens, base_shape, place, low=0, high=len(word_dict) - 1)
157157
results = inferencer.infer({'words': tensor_words})
158158
print("infer results: ", results)
159159

python/paddle/fluid/tests/book/high-level-api/understand_sentiment/test_understand_sentiment_stacked_lstm.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -138,17 +138,17 @@ def infer(use_cuda, inference_program, params_dirname=None):
138138
# Here each word is the basic element of the LoDTensor and the shape of
139139
# each word (base_shape) should be [1] since it is simply an index to
140140
# look up for the corresponding word vector.
141-
# Suppose the length_based level of detail (lod) info is set to [[3, 4, 2]],
142-
# which has only one lod level. Then the created LoDTensor will have only
141+
# Suppose the recursive_sequence_lengths info is set to [[3, 4, 2]],
142+
# which has only one level of detail. Then the created LoDTensor will have only
143143
# one higher level structure (sequence of words, or sentence) than the basic
144144
# element (word). Hence the LoDTensor will hold data for three sentences of
145145
# length 3, 4 and 2, respectively.
146-
# Note that lod info should be a list of lists.
147-
lod = [[3, 4, 2]]
146+
# Note that recursive_sequence_lengths should be a list of lists.
147+
recursive_seq_lens = [[3, 4, 2]]
148148
base_shape = [1]
149149
# The range of random integers is [low, high]
150150
tensor_words = fluid.create_random_int_lodtensor(
151-
lod, base_shape, place, low=0, high=len(word_dict) - 1)
151+
recursive_seq_lens, base_shape, place, low=0, high=len(word_dict) - 1)
152152
results = inferencer.infer({'words': tensor_words})
153153
print("infer results: ", results)
154154

python/paddle/fluid/tests/book/high-level-api/word2vec/test_word2vec_new_api.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -124,21 +124,22 @@ def infer(use_cuda, inference_program, params_dirname=None):
124124

125125
# Setup inputs by creating 4 LoDTensors representing 4 words. Here each word
126126
# is simply an index to look up for the corresponding word vector and hence
127-
# the shape of word (base_shape) should be [1]. The length-based level of
128-
# detail (lod) info of each LoDtensor should be [[1]] meaning there is only
129-
# one lod_level and there is only one sequence of one word on this level.
130-
# Note that lod info should be a list of lists.
131-
lod = [[1]]
127+
# the shape of word (base_shape) should be [1]. The recursive_sequence_lengths,
128+
# which is length-based level of detail (lod) of each LoDTensor, should be [[1]]
129+
# meaning there is only one level of detail and there is only one sequence of
130+
# one word on this level.
131+
# Note that recursive_sequence_lengths should be a list of lists.
132+
recursive_seq_lens = [[1]]
132133
base_shape = [1]
133134
# The range of random integers is [low, high]
134135
first_word = fluid.create_random_int_lodtensor(
135-
lod, base_shape, place, low=0, high=dict_size - 1)
136+
recursive_seq_lens, base_shape, place, low=0, high=dict_size - 1)
136137
second_word = fluid.create_random_int_lodtensor(
137-
lod, base_shape, place, low=0, high=dict_size - 1)
138+
recursive_seq_lens, base_shape, place, low=0, high=dict_size - 1)
138139
third_word = fluid.create_random_int_lodtensor(
139-
lod, base_shape, place, low=0, high=dict_size - 1)
140+
recursive_seq_lens, base_shape, place, low=0, high=dict_size - 1)
140141
fourth_word = fluid.create_random_int_lodtensor(
141-
lod, base_shape, place, low=0, high=dict_size - 1)
142+
recursive_seq_lens, base_shape, place, low=0, high=dict_size - 1)
142143

143144
result = inferencer.infer(
144145
{

0 commit comments

Comments
 (0)