Skip to content

Commit 324e73f

Browse files
committed
fix adaptive pool doc.test=develop
1 parent 4b3f9e5 commit 324e73f

File tree

4 files changed

+167
-87
lines changed

4 files changed

+167
-87
lines changed

paddle/fluid/operators/detection/yolov3_loss_op.cc

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -144,34 +144,40 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
144144
"The ignore threshold to ignore confidence loss.")
145145
.SetDefault(0.7);
146146
AddComment(R"DOC(
147-
This operator generate yolov3 loss by given predict result and ground
147+
This operator generates yolov3 loss based on given predict result and ground
148148
truth boxes.
149149
150150
The output of previous network is in shape [N, C, H, W], while H and W
151-
should be the same, specify the grid size, each grid point predict given
152-
number boxes, this given number is specified by anchors, it should be
153-
half anchors length, which following will be represented as S. In the
154-
second dimention(the channel dimention), C should be S * (class_num + 5),
155-
class_num is the box categoriy number of source dataset(such as coco),
156-
so in the second dimention, stores 4 box location coordinates x, y, w, h
157-
and confidence score of the box and class one-hot key of each anchor box.
151+
should be the same, H and W specify the grid size, each grid point predict
152+
given number boxes, this given number, which following will be represented as S,
153+
is specified by the number of anchors, In the second dimension(the channel
154+
dimension), C should be equal to S * (class_num + 5), class_num is the object
155+
category number of source dataset(such as 80 in coco dataset), so in the
156+
second(channel) dimension, apart from 4 box location coordinates x, y, w, h,
157+
also includes confidence score of the box and class one-hot key of each anchor box.
158158
159-
While the 4 location coordinates if $$tx, ty, tw, th$$, the box predictions
160-
correspnd to:
159+
Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box predictions
160+
should be as follows:
161161
162162
$$
163-
b_x = \sigma(t_x) + c_x
164-
b_y = \sigma(t_y) + c_y
163+
b_x = \\sigma(t_x) + c_x
164+
$$
165+
$$
166+
b_y = \\sigma(t_y) + c_y
167+
$$
168+
$$
165169
b_w = p_w e^{t_w}
170+
$$
171+
$$
166172
b_h = p_h e^{t_h}
167173
$$
168174
169-
While $$c_x, c_y$$ is the left top corner of current grid and $$p_w, p_h$$
170-
is specified by anchors.
175+
In the equation above, :math:`c_x, c_y` is the left top corner of current grid
176+
and :math:`p_w, p_h` is specified by anchors.
171177
172178
As for confidence score, it is the logistic regression value of IoU between
173179
anchor boxes and ground truth boxes, the score of the anchor box which has
174-
the max IoU should be 1, and if the anchor box has IoU bigger then ignore
180+
the max IoU should be 1, and if the anchor box has IoU bigger than ignore
175181
thresh, the confidence score loss of this anchor box will be ignored.
176182
177183
Therefore, the yolov3 loss consist of three major parts, box location loss,
@@ -186,13 +192,13 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
186192
187193
In order to trade off box coordinate losses between big boxes and small
188194
boxes, box coordinate losses will be mutiplied by scale weight, which is
189-
calculated as follow.
195+
calculated as follows.
190196
191197
$$
192198
weight_{box} = 2.0 - t_w * t_h
193199
$$
194200
195-
Final loss will be represented as follow.
201+
Final loss will be represented as follows.
196202
197203
$$
198204
loss = (loss_{xy} + loss_{wh}) * weight_{box}

paddle/fluid/operators/pool_op.cc

Lines changed: 79 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -259,31 +259,40 @@ The input(X) size and output(Out) size may be different.
259259
W_{out} = \\frac{(W_{in} - ksize[1] + 2 * paddings[1] + strides[1] - 1)}{strides[1]} + 1
260260
$$
261261
262-
For exclusive = true:
262+
For exclusive = false:
263263
$$
264264
hstart = i * strides[0] - paddings[0]
265+
$$
266+
$$
265267
hend = hstart + ksize[0]
268+
$$
269+
$$
266270
wstart = j * strides[1] - paddings[1]
271+
$$
272+
$$
267273
wend = wstart + ksize[1]
274+
$$
275+
$$
268276
Output(i ,j) = \\frac{sum(Input[hstart:hend, wstart:wend])}{ksize[0] * ksize[1]}
269277
$$
270-
For exclusive = false:
278+
279+
For exclusive = true:
271280
$$
272281
hstart = max(0, i * strides[0] - paddings[0])
282+
$$
283+
$$
273284
hend = min(H, hstart + ksize[0])
285+
$$
286+
$$
274287
wstart = max(0, j * strides[1] - paddings[1])
288+
$$
289+
$$
275290
wend = min(W, wstart + ksize[1])
291+
$$
292+
$$
276293
Output(i ,j) = \\frac{sum(Input[hstart:hend, wstart:wend])}{(hend - hstart) * (wend - wstart)}
277294
$$
278295
279-
For adaptive = true:
280-
$$
281-
hstart = floor(i * H_{in} / H_{out})
282-
hend = ceil((i + 1) * H_{in} / H_{out})
283-
wstart = floor(j * W_{in} / W_{out})
284-
wend = ceil((j + 1) * W_{in} / W_{out})
285-
Output(i ,j) = \\frac{sum(Input[hstart:hend, wstart:wend])}{(hend - hstart) * (wend - wstart)}
286-
$$
287296
)DOC");
288297
}
289298

@@ -392,48 +401,68 @@ width, respectively. The input(X) size and output(Out) size may be different.
392401
Output:
393402
Out shape: $(N, C, D_{out}, H_{out}, W_{out})$
394403
For ceil_mode = false:
395-
$$
396-
D_{out} = \frac{(D_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\
397-
H_{out} = \frac{(H_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1 \\
398-
W_{out} = \frac{(W_{in} - ksize[2] + 2 * paddings[2])}{strides[2]} + 1
399-
$$
404+
$$
405+
D_{out} = \\frac{(D_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1
406+
$$
407+
$$
408+
H_{out} = \\frac{(H_{in} - ksize[1] + 2 * paddings[1])}{strides[2]} + 1
409+
$$
410+
$$
411+
W_{out} = \\frac{(W_{in} - ksize[2] + 2 * paddings[2])}{strides[2]} + 1
412+
$$
400413
For ceil_mode = true:
401-
$$
402-
D_{out} = \frac{(D_{in} - ksize[0] + 2 * paddings[0] + strides[0] -1)}{strides[0]} + 1 \\
403-
H_{out} = \frac{(H_{in} - ksize[1] + 2 * paddings[1] + strides[1] -1)}{strides[1]} + 1 \\
404-
W_{out} = \frac{(W_{in} - ksize[2] + 2 * paddings[2] + strides[2] -1)}{strides[2]} + 1
405-
$$
406-
For exclusive = true:
407-
$$
408-
dstart = i * strides[0] - paddings[0]
409-
dend = dstart + ksize[0]
410-
hstart = j * strides[1] - paddings[1]
411-
hend = hstart + ksize[1]
412-
wstart = k * strides[2] - paddings[2]
413-
wend = wstart + ksize[2]
414-
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{ksize[0] * ksize[1] * ksize[2]}
415-
$$
414+
$$
415+
D_{out} = \\frac{(D_{in} - ksize[0] + 2 * paddings[0] + strides[0] -1)}{strides[0]} + 1
416+
$$
417+
$$
418+
H_{out} = \\frac{(H_{in} - ksize[1] + 2 * paddings[1] + strides[1] -1)}{strides[1]} + 1
419+
$$
420+
$$
421+
W_{out} = \\frac{(W_{in} - ksize[2] + 2 * paddings[2] + strides[2] -1)}{strides[2]} + 1
422+
$$
423+
416424
For exclusive = false:
417-
$$
418-
dstart = max(0, i * strides[0] - paddings[0])
419-
dend = min(D, dstart + ksize[0])
420-
hstart = max(0, j * strides[1] - paddings[1])
421-
hend = min(H, hstart + ksize[1])
422-
wstart = max(0, k * strides[2] - paddings[2])
423-
wend = min(W, wstart + ksize[2])
424-
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{(dend - dstart) * (hend - hstart) * (wend - wstart)}
425-
$$
426-
427-
For adaptive = true:
428-
$$
429-
dstart = floor(i * D_{in} / D_{out})
430-
dend = ceil((i + 1) * D_{in} / D_{out})
431-
hstart = floor(j * H_{in} / H_{out})
432-
hend = ceil((j + 1) * H_{in} / H_{out})
433-
wstart = floor(k * W_{in} / W_{out})
434-
wend = ceil((k + 1) * W_{in} / W_{out})
435-
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{(dend - dstart) * (hend - hstart) * (wend - wstart)}
436-
$$
425+
$$
426+
dstart = i * strides[0] - paddings[0]
427+
$$
428+
$$
429+
dend = dstart + ksize[0]
430+
$$
431+
$$
432+
hstart = j * strides[1] - paddings[1]
433+
$$
434+
$$
435+
hend = hstart + ksize[1]
436+
$$
437+
$$
438+
wstart = k * strides[2] - paddings[2]
439+
$$
440+
$$
441+
wend = wstart + ksize[2]
442+
$$
443+
$$
444+
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{ksize[0] * ksize[1] * ksize[2]}
445+
$$
446+
447+
For exclusive = true:
448+
$$
449+
dstart = max(0, i * strides[0] - paddings[0])
450+
$$
451+
$$
452+
dend = min(D, dstart + ksize[0])
453+
$$
454+
$$
455+
hend = min(H, hstart + ksize[1])
456+
$$
457+
$$
458+
wstart = max(0, k * strides[2] - paddings[2])
459+
$$
460+
$$
461+
wend = min(W, wstart + ksize[2])
462+
$$
463+
$$
464+
Output(i ,j, k) = \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{(dend - dstart) * (hend - hstart) * (wend - wstart)}
465+
$$
437466
438467
)DOC");
439468
}

python/paddle/fluid/layers/detection.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -545,15 +545,16 @@ def yolov3_loss(x,
545545
TypeError: Attr ignore_thresh of yolov3_loss must be a float number
546546
547547
Examples:
548-
.. code-block:: python
549-
550-
x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32')
551-
gtbox = fluid.layers.data(name='gtbox', shape=[6, 5], dtype='float32')
552-
gtlabel = fluid.layers.data(name='gtlabel', shape=[6, 1], dtype='int32')
553-
anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
554-
anchors = [0, 1, 2]
555-
loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, class_num=80, anchors=anchors,
556-
ignore_thresh=0.5, downsample_ratio=32)
548+
.. code-block:: python
549+
550+
x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32')
551+
gtbox = fluid.layers.data(name='gtbox', shape=[6, 5], dtype='float32')
552+
gtlabel = fluid.layers.data(name='gtlabel', shape=[6, 1], dtype='int32')
553+
anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
554+
anchor_mask = [0, 1, 2]
555+
loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, gtlabel=gtlabel, anchors=anchors,
556+
anchor_mask=anchor_mask, class_num=80,
557+
ignore_thresh=0.7, downsample_ratio=32)
557558
"""
558559
helper = LayerHelper('yolov3_loss', **locals())
559560

python/paddle/fluid/layers/nn.py

Lines changed: 55 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2569,7 +2569,27 @@ def adaptive_pool2d(input,
25692569
require_index=False,
25702570
name=None):
25712571
"""
2572-
${comment}
2572+
**Adaptive Pool2d Operator**
2573+
The adaptive_pool2d operation calculates the output based on the input, pool_size,
2574+
pool_type parameters. Input(X) and output(Out) are in NCHW format, where N is batch
2575+
size, C is the number of channels, H is the height of the feature, and W is
2576+
the width of the feature. Parameters(pool_size) should contain two elements which
2577+
represent height and width, respectively. Also the H and W dimensions of output(Out)
2578+
is same as Parameter(pool_size).
2579+
2580+
For average adaptive pool2d:
2581+
2582+
.. math::
2583+
2584+
hstart &= floor(i * H_{in} / H_{out})
2585+
2586+
hend &= ceil((i + 1) * H_{in} / H_{out})
2587+
2588+
wstart &= floor(j * W_{in} / W_{out})
2589+
2590+
wend &= ceil((j + 1) * W_{in} / W_{out})
2591+
2592+
Output(i ,j) &= \\frac{sum(Input[hstart:hend, wstart:wend])}{(hend - hstart) * (wend - wstart)}
25732593
25742594
Args:
25752595
input (Variable): The input tensor of pooling operator. The format of
@@ -2579,8 +2599,8 @@ def adaptive_pool2d(input,
25792599
pool_size (int|list|tuple): The pool kernel size. If pool kernel size is a tuple or list,
25802600
it must contain two integers, (pool_size_Height, pool_size_Width).
25812601
pool_type: ${pooling_type_comment}
2582-
require_index (bool): If true, the index of max pooling point along with outputs.
2583-
it cannot be set in average pooling type.
2602+
require_index (bool): If true, the index of max pooling point will be returned along
2603+
with outputs. It cannot be set in average pooling type.
25842604
name (str|None): A name for this layer(optional). If set None, the
25852605
layer will be named automatically.
25862606
@@ -2661,18 +2681,42 @@ def adaptive_pool3d(input,
26612681
require_index=False,
26622682
name=None):
26632683
"""
2664-
${comment}
2684+
**Adaptive Pool3d Operator**
2685+
The adaptive_pool3d operation calculates the output based on the input, pool_size,
2686+
pool_type parameters. Input(X) and output(Out) are in NCDHW format, where N is batch
2687+
size, C is the number of channels, D is the depth of the feature, H is the height of
2688+
the feature, and W is the width of the feature. Parameters(pool_size) should contain
2689+
three elements which represent height and width, respectively. Also the D, H and W
2690+
dimensions of output(Out) is same as Parameter(pool_size).
2691+
2692+
For average adaptive pool3d:
2693+
2694+
.. math::
2695+
2696+
dstart &= floor(i * D_{in} / D_{out})
2697+
2698+
dend &= ceil((i + 1) * D_{in} / D_{out})
2699+
2700+
hstart &= floor(j * H_{in} / H_{out})
2701+
2702+
hend &= ceil((j + 1) * H_{in} / H_{out})
2703+
2704+
wstart &= floor(k * W_{in} / W_{out})
2705+
2706+
wend &= ceil((k + 1) * W_{in} / W_{out})
2707+
2708+
Output(i ,j, k) &= \\frac{sum(Input[dstart:dend, hstart:hend, wstart:wend])}{(dend - dstart) * (hend - hstart) * (wend - wstart)}
26652709
26662710
Args:
26672711
input (Variable): The input tensor of pooling operator. The format of
2668-
input tensor is NCHW, where N is batch size, C is
2669-
the number of channels, H is the height of the
2670-
feature, and W is the width of the feature.
2712+
input tensor is NCDHW, where N is batch size, C is
2713+
the number of channels, D is the depth of the feature,
2714+
H is the height of the feature, and W is the width of the feature.
26712715
pool_size (int|list|tuple): The pool kernel size. If pool kernel size is a tuple or list,
2672-
it must contain two integers, (Depth, Height, Width).
2716+
it must contain three integers, (Depth, Height, Width).
26732717
pool_type: ${pooling_type_comment}
2674-
require_index (bool): If true, the index of max pooling point along with outputs.
2675-
it cannot be set in average pooling type.
2718+
require_index (bool): If true, the index of max pooling point will be returned along
2719+
with outputs. It cannot be set in average pooling type.
26762720
name (str|None): A name for this layer(optional). If set None, the
26772721
layer will be named automatically.
26782722
@@ -2709,7 +2753,7 @@ def adaptive_pool3d(input,
27092753
name='data', shape=[3, 32, 32], dtype='float32')
27102754
pool_out, mask = fluid.layers.adaptive_pool3d(
27112755
input=data,
2712-
pool_size=[3, 3],
2756+
pool_size=[3, 3, 3],
27132757
pool_type='avg')
27142758
"""
27152759
if pool_type not in ["max", "avg"]:

0 commit comments

Comments
 (0)