Skip to content

Commit 9b3e0df

Browse files
authored
Merge pull request #13819 from panyx0718/doc
Explain LoD and a few other concepts
2 parents 44f37d0 + 63b2e98 commit 9b3e0df

File tree

3 files changed

+50
-3
lines changed

3 files changed

+50
-3
lines changed

paddle/fluid/pybind/pybind.cc

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,50 @@ PYBIND11_PLUGIN(core) {
157157
.def("_get_double_element", TensorGetElement<double>)
158158
.def("_dtype", [](Tensor &self) { return ToDataType(self.type()); });
159159

160-
py::class_<LoDTensor, Tensor>(m, "LoDTensor")
160+
py::class_<LoDTensor, Tensor>(m, "LoDTensor", R"DOC(
161+
LoDTensor is a Tensor with optional LoD information.
162+
163+
np.array(lod_tensor) can convert LoDTensor to numpy array.
164+
lod_tensor.lod() can retrieve the LoD information.
165+
166+
LoD is short for Level of Details and is usually used for varied sequence
167+
length. You can skip the following comment if you don't need optional LoD.
168+
169+
For example:
170+
A LoDTensor X can look like the example below. It contains 2 sequences.
171+
The first has length 2 and the second has length 3, as described by x.lod.
172+
173+
The first tensor dimension 6=2+3 is calculated from LoD if it's available.
174+
It means the total number of sequence element. In X, each element has 2
175+
columns, hence [6, 2].
176+
177+
x.lod = [[2, 3]]
178+
x.data = [[1, 2], [3, 4],
179+
[5, 6], [7, 8], [9, 10], [11, 12]]
180+
x.shape = [6, 2]
181+
182+
LoD can have multiple levels (for example, a paragraph can have multiple
183+
sentences and a sentence can have multiple words). In the following
184+
LodTensor Y, the lod_level is 2. It means there are 2 sequence, the
185+
first sequence length is 2 (has 2 sub-sequences), the second one's
186+
length is 1. The first sequence's 2 sub-sequences have length 2 and 2,
187+
respectively. And the second sequence's 1 sub-sequence has length 3.
188+
189+
y.lod = [[2 1], [2 2 3]]
190+
y.shape = [2+2+3, ...]
191+
192+
Note:
193+
In above description, LoD is length-based. In Paddle internal
194+
implementation, lod is offset-based. Hence, internally,
195+
y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based
196+
equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).
197+
198+
Sometimes LoD is called recursive_sequence_length to be more
199+
self-explanatory. In this case, it must be length-based. Due to history
200+
reasons. when LoD is called lod in public API, it might be offset-based.
201+
Users should be careful about it.
202+
203+
)DOC")
161204
.def_buffer(
162205
[](Tensor &self) -> py::buffer_info { return CastToPyBuffer(self); })
163206
.def("__init__",

python/paddle/fluid/layers/io.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,11 @@ def data(name,
5555
Args:
5656
name(str): The name/alias of the function
5757
shape(list): Tuple declaring the shape.
58-
append_batch_size(bool): Whether or not to append the data as a batch.
58+
append_batch_size(bool):
59+
1. If true, it prepends -1 to the shape.
60+
For example if shape=[1], the resulting shape is [-1, 1].
61+
2. If shape contains -1, such as shape=[1, -1],
62+
append_batch_size will be enforced to be be False (ineffective).
5963
dtype(int|float): The type of data : float32, float_16, int etc
6064
type(VarType): The output type. By default it is LOD_TENSOR.
6165
lod_level(int): The LoD Level. 0 means the input data is not a sequence.

python/paddle/fluid/layers/tensor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ def create_global_var(shape,
100100
force_cpu=False,
101101
name=None):
102102
"""
103-
Create a new variable in the global block(block 0).
103+
Create a new tensor variable with value in the global block(block 0).
104104
105105
Args:
106106
shape(list[int]): shape of the variable

0 commit comments

Comments
 (0)