Rec automatic optimization, special behavior of layers

A list of layers with special behavior when inside a recurrent loop vs outside recurrent loop (optimized out of the loop).

RecLayer / RnnCellLayer. Inside loop, they have hidden state, and do one step. Outside loop, they operate on the time sequence. Determined whether the input has a time dim.
TwoDLSTMLayer
SelfAttentionLayer (deprecated)
EditDistanceTableLayer. Outside loop case is partly not implemented, although some efficient code already exists.
MaskedComputationLayer
UnmaskLayer
WindowLayer. Inside loop, keeps the previous N (window_size - 1) frames as hidden state such that you have [B,window_size,...]. Assuming window_right=0 and window_left=window_size - 1. Outside loop, just adds the window axis (with an efficient implementation).
CumsumLayer. For input x: inside loop, does output = prev:output + x. Outside loop, wraps tf.cumsum.

All other layers do not have special logic. So the implicit assumption is that the behavior is correct, i.e. when such a layer is optimized out of the loop, the behavior of the overall model/computation will not change. This is obvious correct for layers such as LinearLayer and most other layers where extra axes do not matter and the same operation would be calculated in every time frame. Basically all layers with recurrent=False.

There are some layers which would get confused, for various reasons:

KenLmStateLayer. Assumes to always be inside loop.
DotLayer. The var option needs T?. See #569.
All layers operating on the time-dim axis (implicitly or explicitly via some axis=T), e.g.:
- ConvLayer (implicitly)
- ReduceLayer with axis=T

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rec automatic optimization, special behavior of layers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally