Skip to content

Commit 19534da

Browse files
authored
Merge pull request #15215 from velconia/local_release_1_2_x_add_huber_regression_loss_op
Add python interface for huber loss
2 parents a607b6c + eaaf382 commit 19534da

File tree

3 files changed

+63
-21
lines changed

3 files changed

+63
-21
lines changed

paddle/fluid/API.spec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ paddle.fluid.layers.bilinear_tensor_product ArgSpec(args=['x', 'y', 'size', 'act
197197
paddle.fluid.layers.merge_selected_rows ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
198198
paddle.fluid.layers.get_tensor_from_selected_rows ArgSpec(args=['x', 'name'], varargs=None, keywords=None, defaults=(None,))
199199
paddle.fluid.layers.lstm ArgSpec(args=['input', 'init_h', 'init_c', 'max_len', 'hidden_size', 'num_layers', 'dropout_prob', 'is_bidirec', 'is_test', 'name', 'default_initializer', 'seed'], varargs=None, keywords=None, defaults=(0.0, False, False, None, None, -1))
200+
paddle.fluid.layers.huber_loss ArgSpec(args=['input', 'label', 'delta'], varargs=None, keywords=None, defaults=None)
200201
paddle.fluid.layers.data ArgSpec(args=['name', 'shape', 'append_batch_size', 'dtype', 'lod_level', 'type', 'stop_gradient'], varargs=None, keywords=None, defaults=(True, 'float32', 0, VarType.LOD_TENSOR, True))
201202
paddle.fluid.layers.open_files ArgSpec(args=['filenames', 'shapes', 'lod_levels', 'dtypes', 'thread_num', 'buffer_size', 'pass_num', 'is_test'], varargs=None, keywords=None, defaults=(None, None, 1, None))
202203
paddle.fluid.layers.read_file ArgSpec(args=['reader'], varargs=None, keywords=None, defaults=None)

paddle/fluid/operators/huber_loss_op.cc

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,9 @@ REGISTER_OPERATOR(huber_loss, ops::HuberLossOp, ops::HuberLossOpMaker<float>,
124124
paddle::framework::DefaultGradOpDescMaker<true>);
125125
REGISTER_OPERATOR(huber_loss_grad, ops::HuberLossGradOp);
126126
REGISTER_OP_CPU_KERNEL(
127-
huber_loss,
128-
ops::HuberLossKernel<paddle::platform::CPUDeviceContext, float>);
127+
huber_loss, ops::HuberLossKernel<paddle::platform::CPUDeviceContext, float>,
128+
ops::HuberLossKernel<paddle::platform::CPUDeviceContext, double>);
129129
REGISTER_OP_CPU_KERNEL(
130130
huber_loss_grad,
131-
ops::HuberLossGradKernel<paddle::platform::CPUDeviceContext, float>);
131+
ops::HuberLossGradKernel<paddle::platform::CPUDeviceContext, float>,
132+
ops::HuberLossGradKernel<paddle::platform::CPUDeviceContext, double>);

python/paddle/fluid/layers/nn.py

Lines changed: 58 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@
172172
'merge_selected_rows',
173173
'get_tensor_from_selected_rows',
174174
'lstm',
175+
'huber_loss',
175176
]
176177

177178

@@ -491,7 +492,7 @@ def lstm(input,
491492
If Device is GPU, This op will use cudnn LSTM implementation
492493
493494
A four-gate Long Short-Term Memory network with no peephole connections.
494-
In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1,
495+
In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1,
495496
the cell input ct-1 and the previous layer input xt given matrices W, R and biases bW, bR from the following equations:
496497
497498
$$ i_t = \\sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + bx_i + bh_i) $$
@@ -518,19 +519,19 @@ def lstm(input,
518519
- $\tilde{c_t}$ is also called candidate hidden state,
519520
which is computed based on the current input and the previous hidden state.
520521
521-
Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication,
522+
Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication,
522523
X represensts a matrix multiplication
523524
524525
525526
Args:
526527
input (Variable): LSTM input tensor, shape MUST be ( seq_len x batch_size x input_size )
527-
init_h(Variable): The initial hidden state of the LSTM
528+
init_h(Variable): The initial hidden state of the LSTM
528529
This is a tensor with shape ( num_layers x batch_size x hidden_size)
529530
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
530531
init_c(Variable): The initial cell state of the LSTM.
531532
This is a tensor with shape ( num_layers x batch_size x hidden_size )
532533
if is_bidirec = True, shape should be ( num_layers*2 x batch_size x hidden_size)
533-
max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len
534+
max_len (int): max length of LSTM. the first dim of input tensor CAN NOT greater than max_len
534535
hidden_size (int): hidden size of the LSTM
535536
num_layers (int): total layers number of the LSTM
536537
dropout_prob(float|0.0): dropout prob, dropout ONLY work between rnn layers, NOT between time steps
@@ -549,10 +550,10 @@ def lstm(input,
549550
if is_bidirec set to True, shape will be ( seq_len x batch_sze x hidden_size*2)
550551
last_h(Tensor): the hidden state of the last step of LSTM
551552
shape is ( num_layers x batch_size x hidden_size )
552-
if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
553+
if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
553554
last_c(Tensor): the cell state of the last step of LSTM
554555
shape is ( num_layers x batch_size x hidden_size )
555-
if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
556+
if is_bidirec set to True, shape will be ( num_layers*2 x batch_size x hidden_size)
556557
557558
558559
Examples:
@@ -4390,7 +4391,7 @@ def ctc_greedy_decoder(input, blank, name=None):
43904391
[0.5, 0.1, 0.3, 0.1]]
43914392
43924393
input.lod = [[4, 4]]
4393-
4394+
43944395
Computation:
43954396
43964397
step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
@@ -4423,7 +4424,7 @@ def ctc_greedy_decoder(input, blank, name=None):
44234424
Returns:
44244425
Variable: CTC greedy decode result which is a 2-D tensor with shape [Lp, 1].
44254426
'Lp' is the sum if all output sequences' length. If all the sequences
4426-
in result were empty, the result LoDTensor will be [-1] with
4427+
in result were empty, the result LoDTensor will be [-1] with
44274428
LoD [[]] and dims [1, 1].
44284429
44294430
Examples:
@@ -4777,7 +4778,7 @@ def hsigmoid(input,
47774778
"""
47784779
The hierarchical sigmoid operator is used to accelerate the training
47794780
process of language model. This operator organizes the classes into a
4780-
complete binary tree, or you can use is_custom to pass your own tree to
4781+
complete binary tree, or you can use is_custom to pass your own tree to
47814782
implement hierarchical. Each leaf node represents a class(a word) and each
47824783
internal node acts as a binary classifier. For each word there's a unique
47834784
path from root to it's leaf node, hsigmoid calculate the cost for each
@@ -4793,7 +4794,7 @@ def hsigmoid(input,
47934794
2. build a dict to store word_id -> word's leaf to root path, we call it path_table.
47944795
3. build a dict to store word_id -> code of word's leaf to root path, we call it path_code. Code
47954796
means label of each binary classification, using 1 indicate true, 0 indicate false.
4796-
4. now, each word should has its path and code along the path, you can pass a batch of path and code
4797+
4. now, each word should has its path and code along the path, you can pass a batch of path and code
47974798
related to the same batch of inputs.
47984799
47994800
@@ -4803,8 +4804,8 @@ def hsigmoid(input,
48034804
and :math:`D` is the feature size.
48044805
label (Variable): The tensor variable contains labels of training data.
48054806
It's a tensor with shape is :math:`[N \\times 1]`.
4806-
num_classes: (int), The number of classes, must not be less than 2. with default tree this has to be set,
4807-
it should never be None under is_custom=False, but while is_custom is true, it should be non leaf num
4807+
num_classes: (int), The number of classes, must not be less than 2. with default tree this has to be set,
4808+
it should never be None under is_custom=False, but while is_custom is true, it should be non leaf num
48084809
which indicates the num of classes using by binary classify.
48094810
param_attr (ParamAttr|None): The parameter attribute for learnable parameters/weights
48104811
of hsigmoid. If it is set to None or one attribute of ParamAttr, hsigmoid
@@ -4817,15 +4818,15 @@ def hsigmoid(input,
48174818
is not set, the bias is initialized zero. Default: None.
48184819
name (str|None): A name for this layer(optional). If set None, the layer
48194820
will be named automatically. Default: None.
4820-
path_table: (Variable|None) this variable can store each batch of samples' path to root,
4821+
path_table: (Variable|None) this variable can store each batch of samples' path to root,
48214822
it should be in leaf -> root order
4822-
path_table should have the same shape with path_code, and for each sample i path_table[i] indicates a np.array like
4823-
structure and each element in this array is indexes in parent nodes' Weight Matrix.
4824-
path_code: (Variable|None) this variable can store each batch of samples' code,
4823+
path_table should have the same shape with path_code, and for each sample i path_table[i] indicates a np.array like
4824+
structure and each element in this array is indexes in parent nodes' Weight Matrix.
4825+
path_code: (Variable|None) this variable can store each batch of samples' code,
48254826
each code consist with every code of parent nodes. it should be in leaf -> root order
4826-
is_custom: (bool|False)using user defined binary tree instead of default complete binary tree, if costum is
4827+
is_custom: (bool|False)using user defined binary tree instead of default complete binary tree, if costum is
48274828
set you need to set path_table/path_code/num_classes, otherwise num_classes should be set
4828-
is_sparse: (bool|False)using sparse update instead of dense update, if set, the gradient
4829+
is_sparse: (bool|False)using sparse update instead of dense update, if set, the gradient
48294830
of W and input will be sparse.
48304831
48314832
Returns:
@@ -9049,3 +9050,42 @@ def get_tensor_from_selected_rows(x, name=None):
90499050
outputs={'Out': out},
90509051
attrs={})
90519052
return out
9053+
9054+
9055+
def huber_loss(input, label, delta):
9056+
"""
9057+
Huber loss is a loss function used in robust.
9058+
Huber loss can evaluate the fitness of input to label.
9059+
Different from MSE loss, Huber loss is more robust for outliers.
9060+
When the difference between input and label is large than delta
9061+
.. math::
9062+
huber\_loss = delta * (label - input) - 0.5 * delta * delta
9063+
When the difference between input and label is less than delta
9064+
.. math::
9065+
huber\_loss = 0.5 * (label - input) * (label - input)
9066+
Args:
9067+
input (Variable): This input is a probability computed by the previous operator.
9068+
The first dimension is batch size, and the last dimension is 1.
9069+
label (Variable): The groud truth whose first dimension is batch size
9070+
and last dimension is 1.
9071+
delta (float): The parameter of huber loss, which controls
9072+
the range of outliers
9073+
Returns:
9074+
huber\_loss (Variable): The huber loss with shape [batch_size, 1].
9075+
Examples:
9076+
.. code-block:: python
9077+
predictions = fluid.layers.softmax(x)
9078+
loss = fluid.layers.huber_loss(input=predictions, label=label, 1.0)
9079+
"""
9080+
helper = LayerHelper('huber_loss', **locals())
9081+
residual = helper.create_variable_for_type_inference(
9082+
dtype=helper.input_dtype())
9083+
out = helper.create_variable_for_type_inference(dtype=helper.input_dtype())
9084+
helper.append_op(
9085+
type='huber_loss',
9086+
inputs={'X': input,
9087+
'Y': label},
9088+
outputs={'Out': out,
9089+
'Residual': residual},
9090+
attrs={'delta': delta})
9091+
return out

0 commit comments

Comments
 (0)