@@ -157,7 +157,50 @@ PYBIND11_PLUGIN(core) {
157
157
.def (" _get_double_element" , TensorGetElement<double >)
158
158
.def (" _dtype" , [](Tensor &self) { return ToDataType (self.type ()); });
159
159
160
- py::class_<LoDTensor, Tensor>(m, " LoDTensor" )
160
+ py::class_<LoDTensor, Tensor>(m, " LoDTensor" , R"DOC(
161
+ LoDTensor is a Tensor with optional LoD information.
162
+
163
+ np.array(lod_tensor) can convert LoDTensor to numpy array.
164
+ lod_tensor.lod() can retrieve the LoD information.
165
+
166
+ LoD is short for Level of Details and is usually used for varied sequence
167
+ length. You can skip the following comment if you don't need optional LoD.
168
+
169
+ For example:
170
+ A LoDTensor X can look like the example below. It contains 2 sequences.
171
+ The first has length 2 and the second has length 3, as described by x.lod.
172
+
173
+ The first tensor dimension 6=2+3 is calculated from LoD if it's available.
174
+ It means the total number of sequence element. In X, each element has 2
175
+ columns, hence [6, 2].
176
+
177
+ x.lod = [[2, 3]]
178
+ x.data = [[1, 2], [3, 4],
179
+ [5, 6], [7, 8], [9, 10], [11, 12]]
180
+ x.shape = [6, 2]
181
+
182
+ LoD can have multiple levels (for example, a paragraph can have multiple
183
+ sentences and a sentence can have multiple words). In the following
184
+ LodTensor Y, the lod_level is 2. It means there are 2 sequence, the
185
+ first sequence length is 2 (has 2 sub-sequences), the second one's
186
+ length is 1. The first sequence's 2 sub-sequences have length 2 and 2,
187
+ respectively. And the second sequence's 1 sub-sequence has length 3.
188
+
189
+ y.lod = [[2 1], [2 2 3]]
190
+ y.shape = [2+2+3, ...]
191
+
192
+ Note:
193
+ In above description, LoD is length-based. In Paddle internal
194
+ implementation, lod is offset-based. Hence, internally,
195
+ y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based
196
+ equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).
197
+
198
+ Sometimes LoD is called recursive_sequence_length to be more
199
+ self-explanatory. In this case, it must be length-based. Due to history
200
+ reasons. when LoD is called lod in public API, it might be offset-based.
201
+ Users should be careful about it.
202
+
203
+ )DOC" )
161
204
.def_buffer (
162
205
[](Tensor &self) -> py::buffer_info { return CastToPyBuffer (self); })
163
206
.def (" __init__" ,
0 commit comments