@@ -156,7 +156,50 @@ PYBIND11_PLUGIN(core) {
156
156
.def (" _get_double_element" , TensorGetElement<double >)
157
157
.def (" _dtype" , [](Tensor &self) { return ToDataType (self.type ()); });
158
158
159
- py::class_<LoDTensor, Tensor>(m, " LoDTensor" )
159
+ py::class_<LoDTensor, Tensor>(m, " LoDTensor" , R"DOC(
160
+ LoDTensor is a Tensor with optional LoD information.
161
+
162
+ np.array(lod_tensor) can convert LoDTensor to numpy array.
163
+ lod_tensor.lod() can retrieve the LoD information.
164
+
165
+ LoD is short for Level of Details and is usually used for varied sequence
166
+ length. You can skip the following comment if you don't need optional LoD.
167
+
168
+ For example:
169
+ A LoDTensor X can look like the example below. It contains 2 sequences.
170
+ The first has length 2 and the second has length 3, as described by x.lod.
171
+
172
+ The first tensor dimension 5=2+3 is calculated from LoD if it's available.
173
+ It means the total number of sequence element. In X, each element has 2
174
+ columns, hence [5, 2].
175
+
176
+ x.lod = [[2, 3]]
177
+ x.data = [[1, 2], [3, 4], // seq 1
178
+ [5, 6], [7, 8], [9, 10]] // seq 2
179
+ x.shape = [5, 2]
180
+
181
+ LoD can have multiple levels (for example, a paragraph can have multiple
182
+ sentences and a sentence can have multiple words). In the following
183
+ LodTensor Y, the lod_level is 2. It means there are 2 sequence, the
184
+ first sequence length is 2 (has 2 sub-sequences), the second one's
185
+ length is 1. The first sequence's 2 sub-sequences have length 2 and 2,
186
+ respectively. And the second sequence's 1 sub-sequence has length 3.
187
+
188
+ y.lod = [[2 1], [2 2 3]]
189
+ y.shape = [2+2+3, ...]
190
+
191
+ Note:
192
+ In above description, LoD is length-based. In Paddle internal
193
+ implementation, lod is offset-based. Hence, internally,
194
+ y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based
195
+ equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).
196
+
197
+ Sometimes LoD is called recursive_sequence_length to be more
198
+ self-explanatory. In this case, it must be length-based. Due to history
199
+ reasons. when LoD is called lod in public API, it might be offset-based.
200
+ Users should be careful about it.
201
+
202
+ )DOC" )
160
203
.def_buffer (
161
204
[](Tensor &self) -> py::buffer_info { return CastToPyBuffer (self); })
162
205
.def (" __init__" ,
0 commit comments