@@ -1579,7 +1579,7 @@ def layer_norm(input,
1579
1579
"""
1580
1580
**Layer Normalization**
1581
1581
1582
- Assume feature vectors exist on dimensions
1582
+ Assume feature vectors exist on dimensions
1583
1583
:attr:`begin_norm_axis ... rank(input)` and calculate the moment statistics
1584
1584
along these dimensions for each feature vector :math:`a` with size
1585
1585
:math:`H`, then normalize each feature vector using the corresponding
@@ -1600,13 +1600,13 @@ def layer_norm(input,
1600
1600
1601
1601
Args:
1602
1602
input(Variable): The input tensor variable.
1603
- scale(bool): Whether to learn the adaptive gain :math:`g` after
1603
+ scale(bool): Whether to learn the adaptive gain :math:`g` after
1604
1604
normalization.
1605
- shift(bool): Whether to learn the adaptive bias :math:`b` after
1605
+ shift(bool): Whether to learn the adaptive bias :math:`b` after
1606
1606
normalization.
1607
- begin_norm_axis(bool): The normalization will be performed along
1607
+ begin_norm_axis(bool): The normalization will be performed along
1608
1608
dimensions from :attr:`begin_norm_axis` to :attr:`rank(input)`.
1609
- epsilon(float): The small value added to the variance to prevent
1609
+ epsilon(float): The small value added to the variance to prevent
1610
1610
division by zero.
1611
1611
param_attr(ParamAttr|None): The parameter attribute for the learnable
1612
1612
gain :math:`g`.
@@ -2070,7 +2070,7 @@ def reduce_sum(input, dim=None, keep_dim=False, name=None):
2070
2070
Tensor variable with a single element, otherwise must be in the
2071
2071
range :math:`[-rank(input), rank(input))`. If :math:`dim < 0`,
2072
2072
the dimension to reduce is :math:`rank + dim`.
2073
- keep_dim (bool): Whether to reserve the reduced dimension in the
2073
+ keep_dim (bool|False ): Whether to reserve the reduced dimension in the
2074
2074
output Tensor. The result tensor will have one fewer dimension
2075
2075
than the :attr:`input` unless :attr:`keep_dim` is true.
2076
2076
name(str|None): A name for this layer(optional). If set None, the layer
@@ -3098,33 +3098,33 @@ def multiplex(inputs, index):
3098
3098
def softmax_with_cross_entropy (logits , label , soft_label = False ):
3099
3099
"""
3100
3100
**Softmax With Cross Entropy Operator.**
3101
-
3101
+
3102
3102
Cross entropy loss with softmax is used as the output layer extensively. This
3103
3103
operator computes the softmax normalized values for each row of the input
3104
3104
tensor, after which cross-entropy loss is computed. This provides a more
3105
3105
numerically stable gradient.
3106
-
3106
+
3107
3107
Because this operator performs a softmax on logits internally, it expects
3108
3108
unscaled logits. This operator should not be used with the output of
3109
3109
softmax operator since that would produce incorrect results.
3110
-
3110
+
3111
3111
When the attribute soft_label is set false, this operators expects mutually
3112
3112
exclusive hard labels, each sample in a batch is in exactly one class with a
3113
3113
probability of 1.0. Each sample in the batch will have a single label.
3114
-
3114
+
3115
3115
The equation is as follows:
3116
-
3116
+
3117
3117
1) Hard label (one-hot label, so every sample has exactly one class)
3118
-
3118
+
3119
3119
.. math::
3120
3120
3121
3121
loss_j = -\\ text{logit}_{label_j} +
3122
3122
\\ log\\ left(\\ sum_{i=0}^{K}\\ exp(\\ text{logit}_i)\\ right), j = 1,..., K
3123
-
3123
+
3124
3124
2) Soft label (each sample can have a distribution over all classes)
3125
3125
3126
3126
.. math::
3127
-
3127
+
3128
3128
loss_j = -\\ sum_{i=0}^{K}\\ text{label}_i
3129
3129
\\ left(\\ text{logit}_i - \\ log\\ left(\\ sum_{i=0}^{K}
3130
3130
\\ exp(\\ text{logit}_i)\\ right)\\ right), j = 1,...,K
@@ -3169,7 +3169,7 @@ def smooth_l1(x, y, inside_weight=None, outside_weight=None, sigma=None):
3169
3169
The operator takes the first dimension of X and Y as batch size.
3170
3170
For each instance, it computes the smooth l1 loss element by element first
3171
3171
and then sums all the losses. So the shape of Out is [batch_size, 1].
3172
-
3172
+
3173
3173
Args:
3174
3174
x (Variable): A tensor with rank at least 2. The input value of smooth
3175
3175
l1 loss op with shape [batch_size, dim1, ..., dimN].
0 commit comments