Skip to content

Commit 2360dd2

Browse files
Merge pull request #7438 from wanghaoshuang/ctc_py
Add python API for Warp-CTC op
2 parents c73f00f + 4de6cbd commit 2360dd2

File tree

1 file changed

+73
-43
lines changed
  • python/paddle/v2/fluid/layers

1 file changed

+73
-43
lines changed

python/paddle/v2/fluid/layers/nn.py

Lines changed: 73 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -22,36 +22,13 @@
2222
from tensor import concat
2323

2424
__all__ = [
25-
'fc',
26-
'embedding',
27-
'dynamic_lstm',
28-
'gru_unit',
29-
'linear_chain_crf',
30-
'crf_decoding',
31-
'cos_sim',
32-
'cross_entropy',
33-
'square_error_cost',
34-
'accuracy',
35-
'chunk_eval',
36-
'sequence_conv',
37-
'conv2d',
38-
'sequence_pool',
39-
'pool2d',
40-
'batch_norm',
41-
'beam_search_decode',
42-
'conv2d_transpose',
43-
'sequence_expand',
44-
'lstm_unit',
45-
'reduce_sum',
46-
'reduce_mean',
47-
'reduce_max',
48-
'reduce_min',
49-
'sequence_first_step',
50-
'sequence_last_step',
51-
'dropout',
52-
'split',
53-
'l2_normalize',
54-
'matmul',
25+
'fc', 'embedding', 'dynamic_lstm', 'gru_unit', 'linear_chain_crf',
26+
'crf_decoding', 'cos_sim', 'cross_entropy', 'square_error_cost', 'accuracy',
27+
'chunk_eval', 'sequence_conv', 'conv2d', 'sequence_pool', 'pool2d',
28+
'batch_norm', 'beam_search_decode', 'conv2d_transpose', 'sequence_expand',
29+
'lstm_unit', 'reduce_sum', 'reduce_mean', 'reduce_max', 'reduce_min',
30+
'sequence_first_step', 'sequence_last_step', 'dropout', 'split',
31+
'l2_normalize', 'matmul', 'warpctc'
5532
]
5633

5734

@@ -1721,37 +1698,37 @@ def l2_normalize(x, axis, epsilon=1e-12, name=None):
17211698

17221699
def matmul(x, y, transpose_x=False, transpose_y=False, name=None):
17231700
"""
1724-
Applies matrix multipication to two tensors. Currently only rank 1 to rank
1701+
Applies matrix multipication to two tensors. Currently only rank 1 to rank
17251702
3 input tensors are supported.
17261703
1727-
The actual behavior depends on the shapes of :math:`x`, :math:`y` and the
1704+
The actual behavior depends on the shapes of :math:`x`, :math:`y` and the
17281705
flag values of :attr:`transpose_x`, :attr:`transpose_y`. Specifically:
17291706
1730-
- If a transpose flag is specified, the last two dimensions of the tensor
1731-
are transposed. If the tensor is rank-1 of shape :math:`[D]`, then for
1732-
:math:`x` it is treated as :math:`[1, D]` in nontransposed form and as
1733-
:math:`[D, 1]` in transposed form, whereas for :math:`y` it is the
1734-
opposite: It is treated as :math:`[D, 1]` in nontransposed form and as
1707+
- If a transpose flag is specified, the last two dimensions of the tensor
1708+
are transposed. If the tensor is rank-1 of shape :math:`[D]`, then for
1709+
:math:`x` it is treated as :math:`[1, D]` in nontransposed form and as
1710+
:math:`[D, 1]` in transposed form, whereas for :math:`y` it is the
1711+
opposite: It is treated as :math:`[D, 1]` in nontransposed form and as
17351712
:math:`[1, D]` in transposed form.
17361713
1737-
- After transpose, the two tensors are 2-D or 3-D and matrix multipication
1714+
- After transpose, the two tensors are 2-D or 3-D and matrix multipication
17381715
performs in the following way.
17391716
17401717
- If both are 2-D, they are multiplied like conventional matrices.
1741-
- If either is 3-D, it is treated as a stack of matrices residing in the
1742-
last two dimensions and a batched matrix multiply supporting broadcast
1718+
- If either is 3-D, it is treated as a stack of matrices residing in the
1719+
last two dimensions and a batched matrix multiply supporting broadcast
17431720
applies on the two tensors.
17441721
1745-
Also note that if the raw tensor :math:`x` or :math:`y` is rank-1 and
1746-
nontransposed, the prepended or appended dimension :math:`1` will be
1722+
Also note that if the raw tensor :math:`x` or :math:`y` is rank-1 and
1723+
nontransposed, the prepended or appended dimension :math:`1` will be
17471724
removed after matrix multipication.
17481725
17491726
Args:
17501727
x (Variable): The input variable which is a Tensor or LoDTensor.
17511728
y (Variable): The input variable which is a Tensor or LoDTensor.
17521729
transpose_x (bool): Whether to transpose :math:`x` before multiplication.
17531730
transpose_y (bool): Whether to transpose :math:`y` before multiplication.
1754-
name(str|None): A name for this layer(optional). If set None, the layer
1731+
name(str|None): A name for this layer(optional). If set None, the layer
17551732
will be named automatically.
17561733
17571734
Returns:
@@ -1788,3 +1765,56 @@ def matmul(x, y, transpose_x=False, transpose_y=False, name=None):
17881765
attrs={'transpose_X': transpose_x,
17891766
'transpose_Y': transpose_y})
17901767
return out
1768+
1769+
1770+
def warpctc(input, label, blank=0, norm_by_times=False, **kwargs):
1771+
"""
1772+
An operator integrating the open source Warp-CTC library
1773+
(https://github.com/baidu-research/warp-ctc)
1774+
to compute Connectionist Temporal Classification (CTC) loss.
1775+
It can be aliased as softmax with CTC, since a native softmax activation is
1776+
interated to the Warp-CTC library, to to normlize values for each row of the
1777+
input tensor.
1778+
1779+
Args:
1780+
input(Variable): (LodTensor, default: LoDTensor<float>),
1781+
the unscaled probabilities of variable-length sequences,
1782+
which is a 2-D Tensor with LoD information.
1783+
It's shape is [Lp, num_classes + 1], where Lp is the sum of all input
1784+
sequences' length and num_classes is the true number of classes.
1785+
(not including the blank label).
1786+
label(Variable): (LodTensor, default: LoDTensor<int>), the ground truth
1787+
of variable-length sequence, which is a 2-D Tensor with LoD
1788+
information. It is of the shape [Lg, 1], where Lg is th sum of
1789+
all labels' length.
1790+
blank: (int, default: 0), the blank label index of Connectionist
1791+
Temporal Classification (CTC) loss, which is in the
1792+
half-opened interval [0, num_classes + 1).
1793+
norm_by_times: (bool, default: false), whether to normalize
1794+
the gradients by the number of time-step,which is also the
1795+
sequence's length. There is no need to normalize the gradients
1796+
if warpctc layer was follewed by a mean_op.
1797+
1798+
Returns:
1799+
Variable: The Connectionist Temporal Classification (CTC) loss,
1800+
which is a 2-D Tensor of the shape [batch_size, 1].
1801+
1802+
Examples:
1803+
.. code-block:: python
1804+
y = layers.data(name='y', shape=[11, 8], dtype='float32', lod_level=1)
1805+
y_predict = layers.data(name='y_predict', shape=[11, 1], dtype='float32')
1806+
cost = layers.warpctc(input=y_predict, label=y)
1807+
1808+
"""
1809+
helper = LayerHelper('warpctc', **kwargs)
1810+
loss_out = helper.create_tmp_variable(dtype=input.dtype)
1811+
grad_out = helper.create_tmp_variable(dtype=input.dtype)
1812+
helper.append_op(
1813+
type='warpctc',
1814+
inputs={'Logits': [input],
1815+
'Label': [label]},
1816+
outputs={'WarpCTCGrad': [grad_out],
1817+
'Loss': [loss_out]},
1818+
attrs={'blank': blank,
1819+
'norm_by_times': norm_by_times})
1820+
return loss_out

0 commit comments

Comments
 (0)