Skip to content

Commit ee21f2f

Browse files
committed
Change default value of drop_rate in img_conv_group to 0
1 parent 37a9437 commit ee21f2f

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

python/paddle/v2/fluid/nets.py

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def img_conv_group(input,
5252
conv_act=None,
5353
param_attr=None,
5454
conv_with_batchnorm=False,
55-
conv_batchnorm_drop_rate=None,
55+
conv_batchnorm_drop_rate=0,
5656
pool_stride=1,
5757
pool_type=None):
5858
"""
@@ -120,21 +120,21 @@ def sequence_conv_pool(input,
120120

121121
def glu(input, dim=-1):
122122
"""
123-
The gated linear unit composed by split, sigmoid activation and elementwise
124-
multiplication. Specifically, Split the input into two equal sized parts
125-
:math:`a` and :math:`b` along the given dimension and then compute as
123+
The gated linear unit composed by split, sigmoid activation and elementwise
124+
multiplication. Specifically, Split the input into two equal sized parts
125+
:math:`a` and :math:`b` along the given dimension and then compute as
126126
following:
127127
128128
.. math::
129129
130130
{GLU}(a, b)= a \otimes \sigma(b)
131131
132-
Refer to `Language Modeling with Gated Convolutional Networks
132+
Refer to `Language Modeling with Gated Convolutional Networks
133133
<https://arxiv.org/pdf/1612.08083.pdf>`_.
134-
134+
135135
Args:
136136
input (Variable): The input variable which is a Tensor or LoDTensor.
137-
dim (int): The dimension along which to split. If :math:`dim < 0`, the
137+
dim (int): The dimension along which to split. If :math:`dim < 0`, the
138138
dimension to split along is :math:`rank(input) + dim`.
139139
140140
Returns:
@@ -157,24 +157,24 @@ def dot_product_attention(querys, keys, values):
157157
"""
158158
The dot-product attention.
159159
160-
Attention mechanism can be seen as mapping a query and a set of key-value
161-
pairs to an output. The output is computed as a weighted sum of the values,
162-
where the weight assigned to each value is computed by a compatibility
160+
Attention mechanism can be seen as mapping a query and a set of key-value
161+
pairs to an output. The output is computed as a weighted sum of the values,
162+
where the weight assigned to each value is computed by a compatibility
163163
function (dot-product here) of the query with the corresponding key.
164-
165-
The dot-product attention can be implemented through (batch) matrix
164+
165+
The dot-product attention can be implemented through (batch) matrix
166166
multipication as follows:
167167
168168
.. math::
169169
170170
Attention(Q, K, V)= softmax(QK^\mathrm{T})V
171171
172-
Refer to `Attention Is All You Need
172+
Refer to `Attention Is All You Need
173173
<https://arxiv.org/pdf/1706.03762.pdf>`_.
174174
175-
Note that batch data containing sequences with different lengths is not
175+
Note that batch data containing sequences with different lengths is not
176176
supported by this because of the (batch) matrix multipication.
177-
177+
178178
Args:
179179
query (Variable): The input variable which is a Tensor or LoDTensor.
180180
key (Variable): The input variable which is a Tensor or LoDTensor.

0 commit comments

Comments
 (0)