Skip to content

Commit 7b54b30

Browse files
author
yi.wu
committed
follow comments
1 parent efd1d20 commit 7b54b30

File tree

3 files changed

+30
-79
lines changed

3 files changed

+30
-79
lines changed

paddle/fluid/operators/linear_chain_crf_op.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
8484
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
8585
8686
Equation:
87+
8788
1. Denote Input(Emission) to this operator as $x$ here.
8889
2. The first D values of Input(Transition) to this operator are for starting
8990
weights, denoted as $a$ here.
@@ -106,6 +107,7 @@ Finally, the linear chain CRF operator outputs the logarithm of the conditional
106107
likelihood of each training sample in a mini-batch.
107108
108109
NOTE:
110+
109111
1. The feature function for a CRF is made up of the emission features and the
110112
transition features. The emission feature weights are NOT computed in
111113
this operator. They MUST be computed first before this operator is called.

paddle/fluid/operators/lstm_op.cc

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -198,20 +198,20 @@ c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c_t} \\
198198
h_t = o_t \odot act_h(c_t)
199199
$$
200200
201-
where the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix
202-
of weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$
203-
are diagonal weight matrices for peephole connections. In our implementation,
204-
we use vectors to reprenset these diagonal weight matrices. The b terms
205-
denote bias vectors ($b_i$ is the input gate bias vector), $\sigma$
206-
is the non-line activations, such as logistic sigmoid function, and
207-
$i, f, o$ and $c$ are the input gate, forget gate, output gate,
208-
and cell activation vectors, respectively, all of which have the same size as
209-
the cell output activation vector $h$.
210-
211-
The $\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$
212-
are the cell input and cell output activation functions and `tanh` is usually
213-
used for them. $\tilde{c_t}$ is also called candidate hidden state,
214-
which is computed based on the current input and the previous hidden state.
201+
- W terms denote weight matrices (e.g. $W_{xi}$ is the matrix
202+
of weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$
203+
are diagonal weight matrices for peephole connections. In our implementation,
204+
we use vectors to reprenset these diagonal weight matrices.
205+
- The b terms denote bias vectors ($b_i$ is the input gate bias vector).
206+
- $\sigma$ is the non-line activations, such as logistic sigmoid function.
207+
- $i, f, o$ and $c$ are the input gate, forget gate, output gate,
208+
and cell activation vectors, respectively, all of which have the same size as
209+
the cell output activation vector $h$.
210+
- The $\odot$ is the element-wise product of the vectors.
211+
- $act_g$ and $act_h$ are the cell input and cell output activation functions
212+
and `tanh` is usually used for them.
213+
- $\tilde{c_t}$ is also called candidate hidden state,
214+
which is computed based on the current input and the previous hidden state.
215215
216216
Set `use_peepholes` False to disable peephole connection. The formula
217217
is omitted here, please refer to the paper

python/paddle/fluid/layers/nn.py

Lines changed: 14 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,7 @@ def embedding(input,
262262

263263

264264
# TODO(qijun): expose H0 and C0
265+
@templatedoc(op_type="lstm")
265266
def dynamic_lstm(input,
266267
size,
267268
param_attr=None,
@@ -274,64 +275,19 @@ def dynamic_lstm(input,
274275
dtype='float32',
275276
name=None):
276277
"""
277-
**Dynamic LSTM Layer**
278-
279-
The defalut implementation is diagonal/peephole connection
280-
(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:
281-
282-
.. math::
283-
284-
i_t & = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)
285-
286-
f_t & = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)
287-
288-
\\tilde{c_t} & = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
289-
290-
o_t & = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)
291-
292-
c_t & = f_t \odot c_{t-1} + i_t \odot \\tilde{c_t}
293-
294-
h_t & = o_t \odot act_h(c_t)
295-
296-
where the :math:`W` terms denote weight matrices (e.g. :math:`W_{xi}` is
297-
the matrix of weights from the input gate to the input), :math:`W_{ic}, \
298-
W_{fc}, W_{oc}` are diagonal weight matrices for peephole connections. In
299-
our implementation, we use vectors to reprenset these diagonal weight
300-
matrices. The :math:`b` terms denote bias vectors (:math:`b_i` is the input
301-
gate bias vector), :math:`\sigma` is the non-linear activations, such as
302-
logistic sigmoid function, and :math:`i, f, o` and :math:`c` are the input
303-
gate, forget gate, output gate, and cell activation vectors, respectively,
304-
all of which have the same size as the cell output activation vector :math:`h`.
305-
306-
The :math:`\odot` is the element-wise product of the vectors. :math:`act_g`
307-
and :math:`act_h` are the cell input and cell output activation functions
308-
and `tanh` is usually used for them. :math:`\\tilde{c_t}` is also called
309-
candidate hidden state, which is computed based on the current input and
310-
the previous hidden state.
311-
312-
Set `use_peepholes` to `False` to disable peephole connection. The formula
313-
is omitted here, please refer to the paper
314-
http://www.bioinf.jku.at/publications/older/2604.pdf for details.
315-
316-
Note that these :math:`W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}`
317-
operations on the input :math:`x_{t}` are NOT included in this operator.
318-
Users can choose to use fully-connect layer before LSTM layer.
278+
${comment}
319279
320280
Args:
321-
input(Variable): The input of dynamic_lstm layer, which supports
322-
variable-time length input sequence. The underlying
323-
tensor in this Variable is a matrix with shape
324-
(T X 4D), where T is the total time steps in this
325-
mini-batch, D is the hidden size.
326-
size(int): 4 * hidden size.
327-
param_attr(ParamAttr|None): The parameter attribute for the learnable
281+
input (Variable): ${input_comment}
282+
size (int): 4 * hidden size.
283+
param_attr (ParamAttr|None): The parameter attribute for the learnable
328284
hidden-hidden weights.
329285
330286
- Weights = {:math:`W_{ch}, W_{ih}, \
331287
W_{fh}, W_{oh}`}
332288
- The shape is (D x 4D), where D is the hidden
333289
size.
334-
bias_attr(ParamAttr|None): The bias attribute for the learnable bias
290+
bias_attr (ParamAttr|None): The bias attribute for the learnable bias
335291
weights, which contains two parts, input-hidden
336292
bias weights and peephole connections weights if
337293
setting `use_peepholes` to `True`.
@@ -343,21 +299,14 @@ def dynamic_lstm(input,
343299
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \
344300
W_{fc}, W_{oc}`}.
345301
- The shape is (1 x 7D).
346-
use_peepholes(bool): Whether to enable diagonal/peephole connections,
347-
default `True`.
348-
is_reverse(bool): Whether to compute reversed LSTM, default `False`.
349-
gate_activation(str): The activation for input gate, forget gate and
350-
output gate. Choices = ["sigmoid", "tanh", "relu",
351-
"identity"], default "sigmoid".
352-
cell_activation(str): The activation for cell output. Choices = ["sigmoid",
353-
"tanh", "relu", "identity"], default "tanh".
354-
candidate_activation(str): The activation for candidate hidden state.
355-
Choices = ["sigmoid", "tanh",
356-
"relu", "identity"],
357-
default "tanh".
358-
dtype(str): Data type. Choices = ["float32", "float64"], default "float32".
359-
name(str|None): A name for this layer(optional). If set None, the layer
360-
will be named automatically.
302+
use_peepholes (bool): ${use_peepholes_comment}
303+
is_reverse (bool): ${is_reverse_comment}
304+
gate_activation (str): ${gate_activation_comment}
305+
cell_activation (str): ${cell_activation_comment}
306+
candidate_activation (str): ${candidate_activation_comment}
307+
dtype (str): Data type. Choices = ["float32", "float64"], default "float32".
308+
name (str|None): A name for this layer(optional). If set None, the layer
309+
will be named automatically.
361310
362311
Returns:
363312
tuple: The hidden state, and cell state of LSTM. The shape of both \

0 commit comments

Comments
 (0)