172
172
'merge_selected_rows' ,
173
173
'get_tensor_from_selected_rows' ,
174
174
'lstm' ,
175
+ 'huber_loss' ,
175
176
]
176
177
177
178
@@ -491,7 +492,7 @@ def lstm(input,
491
492
If Device is GPU, This op will use cudnn LSTM implementation
492
493
493
494
A four-gate Long Short-Term Memory network with no peephole connections.
494
- In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1,
495
+ In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1,
495
496
the cell input ct-1 and the previous layer input xt given matrices W, R and biases bW, bR from the following equations:
496
497
497
498
$$ i_t = \\ sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + bx_i + bh_i) $$
@@ -518,19 +519,19 @@ def lstm(input,
518
519
- $\t ilde{c_t}$ is also called candidate hidden state,
519
520
which is computed based on the current input and the previous hidden state.
520
521
521
- Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication,
522
+ Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication,
522
523
X represensts a matrix multiplication
523
524
524
525
525
526
Args:
526
527
input (Variable): LSTM input tensor, shape MUST be ( seq_len x batch_size x input_size )
527
- init_h(Variable): The initial hidden state of the LSTM
528
+ init_h(Variable): The initial hidden state of the LSTM
528
529
This is a tensor with shape ( num_layers x batch_size x hidden_size)
529
530
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
530
531
init_c(Variable): The initial cell state of the LSTM.
531
532
This is a tensor with shape ( num_layers x batch_size x hidden_size )
532
533
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
533
- max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len
534
+ max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len
534
535
hidden_size (int): hidden size of the LSTM
535
536
num_layers (int): total layers number of the LSTM
536
537
dropout_prob(float|0.0): dropout prob, dropout ONLY work between rnn layers, NOT between time steps
@@ -549,10 +550,10 @@ def lstm(input,
549
550
if is_bidirec set to True, shape will be ( seq_len x batch_sze x hidden_size*2)
550
551
last_h(Tensor): the hidden state of the last step of LSTM
551
552
shape is ( num_layers x batch_size x hidden_size )
552
- if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
553
+ if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
553
554
last_c(Tensor): the cell state of the last step of LSTM
554
555
shape is ( num_layers x batch_size x hidden_size )
555
- if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
556
+ if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
556
557
557
558
558
559
Examples:
@@ -4390,7 +4391,7 @@ def ctc_greedy_decoder(input, blank, name=None):
4390
4391
[0.5, 0.1, 0.3, 0.1]]
4391
4392
4392
4393
input.lod = [[4, 4]]
4393
-
4394
+
4394
4395
Computation:
4395
4396
4396
4397
step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
@@ -4423,7 +4424,7 @@ def ctc_greedy_decoder(input, blank, name=None):
4423
4424
Returns:
4424
4425
Variable: CTC greedy decode result which is a 2-D tensor with shape [Lp, 1].
4425
4426
'Lp' is the sum if all output sequences' length. If all the sequences
4426
- in result were empty, the result LoDTensor will be [-1] with
4427
+ in result were empty, the result LoDTensor will be [-1] with
4427
4428
LoD [[]] and dims [1, 1].
4428
4429
4429
4430
Examples:
@@ -4777,7 +4778,7 @@ def hsigmoid(input,
4777
4778
"""
4778
4779
The hierarchical sigmoid operator is used to accelerate the training
4779
4780
process of language model. This operator organizes the classes into a
4780
- complete binary tree, or you can use is_custom to pass your own tree to
4781
+ complete binary tree, or you can use is_custom to pass your own tree to
4781
4782
implement hierarchical. Each leaf node represents a class(a word) and each
4782
4783
internal node acts as a binary classifier. For each word there's a unique
4783
4784
path from root to it's leaf node, hsigmoid calculate the cost for each
@@ -4793,7 +4794,7 @@ def hsigmoid(input,
4793
4794
2. build a dict to store word_id -> word's leaf to root path, we call it path_table.
4794
4795
3. build a dict to store word_id -> code of word's leaf to root path, we call it path_code. Code
4795
4796
means label of each binary classification, using 1 indicate true, 0 indicate false.
4796
- 4. now, each word should has its path and code along the path, you can pass a batch of path and code
4797
+ 4. now, each word should has its path and code along the path, you can pass a batch of path and code
4797
4798
related to the same batch of inputs.
4798
4799
4799
4800
@@ -4803,8 +4804,8 @@ def hsigmoid(input,
4803
4804
and :math:`D` is the feature size.
4804
4805
label (Variable): The tensor variable contains labels of training data.
4805
4806
It's a tensor with shape is :math:`[N \\ times 1]`.
4806
- num_classes: (int), The number of classes, must not be less than 2. with default tree this has to be set,
4807
- it should never be None under is_custom=False, but while is_custom is true, it should be non leaf num
4807
+ num_classes: (int), The number of classes, must not be less than 2. with default tree this has to be set,
4808
+ it should never be None under is_custom=False, but while is_custom is true, it should be non leaf num
4808
4809
which indicates the num of classes using by binary classify.
4809
4810
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
4810
4811
of hsigmoid. If it is set to None or one attribute of ParamAttr, hsigmoid
@@ -4817,15 +4818,15 @@ def hsigmoid(input,
4817
4818
is not set, the bias is initialized zero. Default: None.
4818
4819
name (str|None): A name for this layer(optional). If set None, the layer
4819
4820
will be named automatically. Default: None.
4820
- path_table: (Variable|None) this variable can store each batch of samples' path to root,
4821
+ path_table: (Variable|None) this variable can store each batch of samples' path to root,
4821
4822
it should be in leaf -> root order
4822
- path_table should have the same shape with path_code, and for each sample i path_table[i] indicates a np.array like
4823
- structure and each element in this array is indexes in parent nodes' Weight Matrix.
4824
- path_code: (Variable|None) this variable can store each batch of samples' code,
4823
+ path_table should have the same shape with path_code, and for each sample i path_table[i] indicates a np.array like
4824
+ structure and each element in this array is indexes in parent nodes' Weight Matrix.
4825
+ path_code: (Variable|None) this variable can store each batch of samples' code,
4825
4826
each code consist with every code of parent nodes. it should be in leaf -> root order
4826
- is_custom: (bool|False)using user defined binary tree instead of default complete binary tree, if costum is
4827
+ is_custom: (bool|False)using user defined binary tree instead of default complete binary tree, if costum is
4827
4828
set you need to set path_table/path_code/num_classes, otherwise num_classes should be set
4828
- is_sparse: (bool|False)using sparse update instead of dense update, if set, the gradient
4829
+ is_sparse: (bool|False)using sparse update instead of dense update, if set, the gradient
4829
4830
of W and input will be sparse.
4830
4831
4831
4832
Returns:
@@ -9049,3 +9050,42 @@ def get_tensor_from_selected_rows(x, name=None):
9049
9050
outputs = {'Out' : out },
9050
9051
attrs = {})
9051
9052
return out
9053
+
9054
+
9055
+ def huber_loss (input , label , delta ):
9056
+ """
9057
+ Huber loss is a loss function used in robust.
9058
+ Huber loss can evaluate the fitness of input to label.
9059
+ Different from MSE loss, Huber loss is more robust for outliers.
9060
+ When the difference between input and label is large than delta
9061
+ .. math::
9062
+ huber\_loss = delta * (label - input) - 0.5 * delta * delta
9063
+ When the difference between input and label is less than delta
9064
+ .. math::
9065
+ huber\_loss = 0.5 * (label - input) * (label - input)
9066
+ Args:
9067
+ input (Variable): This input is a probability computed by the previous operator.
9068
+ The first dimension is batch size, and the last dimension is 1.
9069
+ label (Variable): The groud truth whose first dimension is batch size
9070
+ and last dimension is 1.
9071
+ delta (float): The parameter of huber loss, which controls
9072
+ the range of outliers
9073
+ Returns:
9074
+ huber\_loss (Variable): The huber loss with shape [batch_size, 1].
9075
+ Examples:
9076
+ .. code-block:: python
9077
+ predictions = fluid.layers.softmax(x)
9078
+ loss = fluid.layers.huber_loss(input=predictions, label=label, 1.0)
9079
+ """
9080
+ helper = LayerHelper ('huber_loss' , ** locals ())
9081
+ residual = helper .create_variable_for_type_inference (
9082
+ dtype = helper .input_dtype ())
9083
+ out = helper .create_variable_for_type_inference (dtype = helper .input_dtype ())
9084
+ helper .append_op (
9085
+ type = 'huber_loss' ,
9086
+ inputs = {'X' : input ,
9087
+ 'Y' : label },
9088
+ outputs = {'Out' : out ,
9089
+ 'Residual' : residual },
9090
+ attrs = {'delta' : delta })
9091
+ return out
0 commit comments