Skip to content

Commit f3c42f6

Browse files
authored
Add doc for gru_unit op (in fluid) (#7151)
* Add squared error layers doc * Add doc for gru_unit * Remove cdot which isn't supported * Update layers.rst * Update layers.rst (minor)
1 parent 564dba1 commit f3c42f6

File tree

2 files changed

+43
-11
lines changed

2 files changed

+43
-11
lines changed

doc/api/v2/fluid/layers.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,12 @@ sequence_expand
307307
:noindex:
308308

309309

310+
gru_unit
311+
--------
312+
.. autofunction:: paddle.v2.fluid.layers.gru_unit
313+
:noindex:
314+
315+
310316
lstm_unit
311317
---------
312318
.. autofunction:: paddle.v2.fluid.layers.lstm_unit

python/paddle/v2/fluid/layers/nn.py

Lines changed: 37 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -236,21 +236,47 @@ def gru_unit(input,
236236
activation='tanh',
237237
gate_activation='sigmoid'):
238238
"""
239-
GRUUnit Operator implements partial calculations of the GRU unit as following:
239+
GRU unit layer. The equation of a gru step is:
240240
241-
$$
242-
update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
243-
reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r) \\
244-
output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
245-
output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
246-
$$
241+
.. math::
242+
u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)
243+
244+
r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)
245+
246+
ch_t & = actNode(xc_t + W_c dot(r_t, h_{t-1}) + b_c)
247+
248+
h_t & = dot((1-u_t), ch_{t-1}) + dot(u_t, h_t)
247249
248-
which is same as one time step of GRU Operator.
250+
The inputs of gru unit includes :math:`z_t`, :math:`h_{t-1}`. In terms
251+
of the equation above, the :math:`z_t` is split into 3 parts -
252+
:math:`xu_t`, :math:`xr_t` and :math:`xc_t`. This means that in order to
253+
implement a full GRU unit operator for an input, a fully
254+
connected layer has to be applied, such that :math:`z_t = W_{fc}x_t`.
255+
256+
This layer has three outputs :math:`h_t`, :math:`dot(r_t, h_{t - 1})`
257+
and concatenation of :math:`u_t`, :math:`r_t` and :math:`ch_t`.
258+
259+
Args:
260+
input (Variable): The fc transformed input value of current step.
261+
hidden (Variable): The hidden value of lstm unit from previous step.
262+
size (integer): The input dimension value.
263+
weight (ParamAttr): The weight parameters for gru unit. Default: None
264+
bias (ParamAttr): The bias parameters for gru unit. Default: None
265+
activation (string): The activation type for cell (actNode). Default: 'tanh'
266+
gate_activation (string): The activation type for gates (actGate). Default: 'sigmoid'
267+
268+
Returns:
269+
tuple: The hidden value, reset-hidden value and gate values.
270+
271+
Examples:
272+
273+
.. code-block:: python
249274
250-
@note To implement the complete GRU unit, fully-connected operator must be
251-
used before to feed xu, xr and xc as the Input of GRUUnit operator.
275+
# assuming we have x_t_data and prev_hidden of size=10
276+
x_t = fluid.layers.fc(input=x_t_data, size=30)
277+
hidden_val, r_h_val, gate_val = fluid.layers.gru_unit(input=x_t,
278+
hidden = prev_hidden)
252279
253-
TODO(ChunweiYan) add more document here
254280
"""
255281
activation_dict = dict(
256282
identity=0,

0 commit comments

Comments
 (0)