Extend SliceNdLayer to work with start of shape (B,T) and with dynamic sizes

I would like to extend the `SliceNdLayer` so that it works if we provide `start` as a layer of shape `(B,T)`. This is necessary, if `start` is inside a recurrent unit and is optimized out of the loop. Currently, the layer only works if `start` is of shape `(B,)`.

Additionally, I would like to make `size` a layer so that we can extract slices with a dynamic length. Again, this should also work if `size` is of shape `(B,T)` because of the reason mentioned above. Currently, the layer only supports `size` being of type `int`.

Right now, if `start` is of shape `(B,)` and the input is of shape `(B,T,...)`, `SliceNdLayer` returns shape `(B,size,...)`. The question is, what we should return if `start` is of shape `(B,T)`. I think it would be useful if it was of shape `(B,T,size,...)` as it would then be easy to select slices for each time step of the sequences. This works easily if `size` is a static integer. It gets a bit more complicated, if `size` also becomes a layer of shape `(B,T)` because the size placeholder of the `size` dimension of the output would then depend on both the sequence and the time step, making it 2-dimensional.

I have already tried out some code in my local repository. My current solution for allowing `start` to be of shape `(B,T)` is the following: the layer uses the `slice_nd` function from `returnn.tf.util` which I assume we don't want to change as other layers might depend on it. The `slice_nd` function expects `start` to be of shape `(B,)` and `size` to be an integer. Therefore, in the `SliceNdLayer` I flatten all axes of `start` into the batch dimension and repeat the entries of the input as many times as to fit the new shape of `start`. After `slice_nd` is finished, the returned slices can be reshaped again to match the size `(B,T,size,...)`. This functionality already works.

I also tried out to allow `size` to be a layer of shape `(B,T)`. Before passing it to the `slice_nd` function, I extracted the max size to get an integer. After `slice_nd` is finished, I padded the extracted slices (which are all of the same size) in order to match the sizes from the `size` layer. I think this should also work in theory, however, so far I am not sure of how to set the dimension tags such that the size placeholder of the `size` axis is dependent on both the batch and the time dimension. We anyway have to decide if we want to do this or if we want to return another shape than I proposed. 

What do you think @albertz ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend SliceNdLayer to work with start of shape (B,T) and with dynamic sizes #625

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extend SliceNdLayer to work with start of shape (B,T) and with dynamic sizes #625

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions