Skip to content

Extend SliceNdLayer to work with start of shape (B,T) and with dynamic sizes #625

@robin-p-schmitt

Description

@robin-p-schmitt

I would like to extend the SliceNdLayer so that it works if we provide start as a layer of shape (B,T). This is necessary, if start is inside a recurrent unit and is optimized out of the loop. Currently, the layer only works if start is of shape (B,).

Additionally, I would like to make size a layer so that we can extract slices with a dynamic length. Again, this should also work if size is of shape (B,T) because of the reason mentioned above. Currently, the layer only supports size being of type int.

Right now, if start is of shape (B,) and the input is of shape (B,T,...), SliceNdLayer returns shape (B,size,...). The question is, what we should return if start is of shape (B,T). I think it would be useful if it was of shape (B,T,size,...) as it would then be easy to select slices for each time step of the sequences. This works easily if size is a static integer. It gets a bit more complicated, if size also becomes a layer of shape (B,T) because the size placeholder of the size dimension of the output would then depend on both the sequence and the time step, making it 2-dimensional.

I have already tried out some code in my local repository. My current solution for allowing start to be of shape (B,T) is the following: the layer uses the slice_nd function from returnn.tf.util which I assume we don't want to change as other layers might depend on it. The slice_nd function expects start to be of shape (B,) and size to be an integer. Therefore, in the SliceNdLayer I flatten all axes of start into the batch dimension and repeat the entries of the input as many times as to fit the new shape of start. After slice_nd is finished, the returned slices can be reshaped again to match the size (B,T,size,...). This functionality already works.

I also tried out to allow size to be a layer of shape (B,T). Before passing it to the slice_nd function, I extracted the max size to get an integer. After slice_nd is finished, I padded the extracted slices (which are all of the same size) in order to match the sizes from the size layer. I think this should also work in theory, however, so far I am not sure of how to set the dimension tags such that the size placeholder of the size axis is dependent on both the batch and the time dimension. We anyway have to decide if we want to do this or if we want to return another shape than I proposed.

What do you think @albertz ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions