@@ -233,99 +233,94 @@ def dynamic_lstm(input,
233
233
The defalut implementation is diagonal/peephole connection
234
234
(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:
235
235
236
- .. math:
237
-
238
- i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i) \\
236
+ .. math::
237
+
238
+ i_t & = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)
239
239
240
- f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f) \\
240
+ f_t & = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)
241
241
242
- \ t ilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c) \\
242
+ \\ tilde{c_t} & = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
243
243
244
- o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o) \\
244
+ o_t & = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)
245
245
246
- c_t = f_t \odot c_{t-1} + i_t \odot \t ilde{c_t} \\
246
+ c_t & = f_t \odot c_{t-1} + i_t \odot \\ tilde{c_t}
247
247
248
- h_t = o_t \odot act_h(c_t)
248
+ h_t & = o_t \odot act_h(c_t)
249
249
250
- where the W terms denote weight matrices (e.g. $ W_{xi}$ is the matrix
251
- of weights from the input gate to the input), $ W_{ic}, W_{fc}, W_{oc}$
250
+ where the :math:`W` terms denote weight matrices (e.g. :math:` W_{xi}` is the matrix
251
+ of weights from the input gate to the input), :math:` W_{ic}, W_{fc}, W_{oc}`
252
252
are diagonal weight matrices for peephole connections. In our implementation,
253
- we use vectors to reprenset these diagonal weight matrices. The b terms
254
- denote bias vectors ($ b_i$ is the input gate bias vector), $ \sigma$
253
+ we use vectors to reprenset these diagonal weight matrices. The :math:`b` terms
254
+ denote bias vectors (:math:` b_i` is the input gate bias vector), :math:` \sigma`
255
255
is the non-line activations, such as logistic sigmoid function, and
256
- $ i, f, o$ and $c$ are the input gate, forget gate, output gate,
256
+ :math:` i, f, o` and :math:`c` are the input gate, forget gate, output gate,
257
257
and cell activation vectors, respectively, all of which have the same size as
258
- the cell output activation vector $h$ .
258
+ the cell output activation vector :math:`h` .
259
259
260
- The $ \odot$ is the element-wise product of the vectors. $ act_g$ and $ act_h$
260
+ The :math:` \odot` is the element-wise product of the vectors. :math:` act_g` and :math:` act_h`
261
261
are the cell input and cell output activation functions and `tanh` is usually
262
- used for them. $ \ t ilde{c_t}$ is also called candidate hidden state,
262
+ used for them. :math:` \\ tilde{c_t}` is also called candidate hidden state,
263
263
which is computed based on the current input and the previous hidden state.
264
264
265
265
Set `use_peepholes` False to disable peephole connection. The formula
266
266
is omitted here, please refer to the paper
267
267
http://www.bioinf.jku.at/publications/older/2604.pdf for details.
268
268
269
- Note that these $ W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$
270
- operations on the input $ x_{t}$ are NOT included in this operator.
271
- Users can choose to use fully-connect operator before LSTM operator .
269
+ Note that these :math:` W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}`
270
+ operations on the input :math:` x_{t}` are NOT included in this operator.
271
+ Users can choose to use fully-connect layer before LSTM layer .
272
272
273
273
Args:
274
- def dynamic_lstm(input,
275
- size,
276
- param_attr=None,
277
- bias_attr=None,
278
- use_peepholes=True,
279
- is_reverse=False,
280
- gate_activation='sigmoid',
281
- cell_activation='tanh',
282
- candidate_activation='tanh',
283
- dtype='float32'):
284
- input(Variable): The input of dynamic_lstm layer, which support
285
- variable-time length input sequence. The underlying tensor in
286
- this Variable is a matrix with shape (T X 4D), where T is the
287
- total time steps in this mini-batch, D is the hidden size.
288
- size(int): The size of input.
274
+ input(Variable): The input of dynamic_lstm layer, which supports
275
+ variable-time length input sequence. The underlying
276
+ tensor in this Variable is a matrix with shape
277
+ (T X 4D), where T is the total time steps in this
278
+ mini-batch, D is the hidden size.
279
+ size(int): 4 * hidden size.
289
280
param_attr(ParamAttr): The parameter attribute for the learnable
290
- hidden-hidden weights.
291
- - The shape is (D x 4D), where D is the hidden size.
292
- - param_attr = {W_ch, W_ih, W_fh, W_oh}
281
+ hidden-hidden weights.
282
+
283
+ - The shape is (D x 4D), where D is the hidden
284
+ size.
285
+ - Weights = {:math:`W_{ch}, W_{ih}, \
286
+ W_{fh}, W_{oh}`}
293
287
bias_attr(ParamAttr): The bias attribute for the learnable bias
294
- weights, which contains two parts: input-hidden bias weight
295
- and peephole connections weight if setting `use_peepholes` to True.
296
- 1. `use_peepholes = False`
297
- - The shape is (1 x 4D).
298
- - Bias = {b_c, b_i, b_f, b_o}.
299
- 2. `use_peepholes = True`
300
- - The shape is (1 x 7D).
301
- - Bias = {b_c, b_i, b_f, b_o, W_ic, W_fc, W_oc}.
302
- use_peepholes(bool, defalut: True): whether to enable diagonal/peephole
303
- connections.
304
- is_reverse(bool, defalut: False): whether to compute reversed LSTM.
305
- gate_activation(string, choices: "sigmoid", "tanh", "relu", "identity",
306
- default: "sigmoid"): The activation for input gate, forget gate and
307
- output gate.
308
- cell_activation(string, choices: "sigmoid", "tanh", "relu", "identity",
309
- default: "tanh"): The activation for cell output.
310
- candidate_activation(string, choices: "sigmoid", "tanh", "relu",
311
- "identity", default: "tanh"): The activation for candidate hidden
312
- state.
313
- dtype(string, )
288
+ weights, which contains two parts, input-hidden
289
+ bias weights and peephole connections weights if
290
+ setting `use_peepholes` to `True`.
291
+
292
+ 1. `use_peepholes = False`
293
+ - The shape is (1 x 4D).
294
+ - Biases = {:math:`b_c, b_i, b_f, b_o`}.
295
+ 2. `use_peepholes = True`
296
+ - The shape is (1 x 7D).
297
+ - Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \
298
+ W_{fc}, W_{oc}`}.
299
+ use_peepholes(bool): Whether to enable diagonal/peephole connections,
300
+ default `True`.
301
+ is_reverse(bool): Whether to compute reversed LSTM, default `False`.
302
+ gate_activation(str): The activation for input gate, forget gate and
303
+ output gate. Choices = ["sigmoid", "tanh", "relu",
304
+ "identity"], default "sigmoid".
305
+ cell_activation(str): The activation for cell output. Choices = ["sigmoid",
306
+ "tanh", "relu", "identity"], default "tanh".
307
+ candidate_activation(str): The activation for candidate hidden state.
308
+ Choices = ["sigmoid", "tanh", "relu", "identity"],
309
+ default "tanh".
310
+ dtype(str): Data type. Choices = ["float32", "float64"], default "float32".
314
311
315
312
Returns:
316
- hidden(Variable): the hidden state of LSTM layer. The shape is (T x D),
317
- and lod is the same with the `input`.
318
- cell(Variable): the cell state of LSTM layer. The shape is (T x D), and
319
- lod is the same with the `input`.
313
+ tuple: The hidden state, and cell state of LSTM. The shape of both \
314
+ is (T x D), and lod is the same with the `input`.
320
315
321
- Example :
316
+ Examples :
322
317
.. code-block:: python
323
318
324
- hidden_dim = 512
325
- forward_proj = fluid.layers.fc(input=input_seq, size=hidden_dim * 4,
326
- act='tanh', bias_attr=True)
327
- forward, _ = fluid.layers.dynamic_lstm(
328
- input=forward_proj, size=hidden_dim * 4, use_peepholes=False)
319
+ hidden_dim = 512
320
+ forward_proj = fluid.layers.fc(input=input_seq, size=hidden_dim * 4,
321
+ act='tanh', bias_attr=True)
322
+ forward, _ = fluid.layers.dynamic_lstm(
323
+ input=forward_proj, size=hidden_dim * 4, use_peepholes=False)
329
324
"""
330
325
helper = LayerHelper ('lstm' , ** locals ())
331
326
size = size / 4
0 commit comments