@@ -236,21 +236,47 @@ def gru_unit(input,
236
236
activation = 'tanh' ,
237
237
gate_activation = 'sigmoid' ):
238
238
"""
239
- GRUUnit Operator implements partial calculations of the GRU unit as following :
239
+ GRU unit layer. The equation of a gru step is :
240
240
241
- $$
242
- update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
243
- reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r) \\
244
- output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
245
- output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
246
- $$
241
+ .. math::
242
+ u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)
243
+
244
+ r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)
245
+
246
+ ch_t & = actNode(xc_t + W_c dot(r_t, h_{t-1}) + b_c)
247
+
248
+ h_t & = dot((1-u_t), ch_{t-1}) + dot(u_t, h_t)
247
249
248
- which is same as one time step of GRU Operator.
250
+ The inputs of gru unit includes :math:`z_t`, :math:`h_{t-1}`. In terms
251
+ of the equation above, the :math:`z_t` is split into 3 parts -
252
+ :math:`xu_t`, :math:`xr_t` and :math:`xc_t`. This means that in order to
253
+ implement a full GRU unit operator for an input, a fully
254
+ connected layer has to be applied, such that :math:`z_t = W_{fc}x_t`.
255
+
256
+ This layer has three outputs :math:`h_t`, :math:`dot(r_t, h_{t - 1})`
257
+ and concatenation of :math:`u_t`, :math:`r_t` and :math:`ch_t`.
258
+
259
+ Args:
260
+ input (Variable): The fc transformed input value of current step.
261
+ hidden (Variable): The hidden value of lstm unit from previous step.
262
+ size (integer): The input dimension value.
263
+ weight (ParamAttr): The weight parameters for gru unit. Default: None
264
+ bias (ParamAttr): The bias parameters for gru unit. Default: None
265
+ activation (string): The activation type for cell (actNode). Default: 'tanh'
266
+ gate_activation (string): The activation type for gates (actGate). Default: 'sigmoid'
267
+
268
+ Returns:
269
+ tuple: The hidden value, reset-hidden value and gate values.
270
+
271
+ Examples:
272
+
273
+ .. code-block:: python
249
274
250
- @note To implement the complete GRU unit, fully-connected operator must be
251
- used before to feed xu, xr and xc as the Input of GRUUnit operator.
275
+ # assuming we have x_t_data and prev_hidden of size=10
276
+ x_t = fluid.layers.fc(input=x_t_data, size=30)
277
+ hidden_val, r_h_val, gate_val = fluid.layers.gru_unit(input=x_t,
278
+ hidden = prev_hidden)
252
279
253
- TODO(ChunweiYan) add more document here
254
280
"""
255
281
activation_dict = dict (
256
282
identity = 0 ,
0 commit comments