-
Notifications
You must be signed in to change notification settings - Fork 134
Description
I would like to extend the SliceNdLayer so that it works if we provide start as a layer of shape (B,T). This is necessary, if start is inside a recurrent unit and is optimized out of the loop. Currently, the layer only works if start is of shape (B,).
Additionally, I would like to make size a layer so that we can extract slices with a dynamic length. Again, this should also work if size is of shape (B,T) because of the reason mentioned above. Currently, the layer only supports size being of type int.
Right now, if start is of shape (B,) and the input is of shape (B,T,...), SliceNdLayer returns shape (B,size,...). The question is, what we should return if start is of shape (B,T). I think it would be useful if it was of shape (B,T,size,...) as it would then be easy to select slices for each time step of the sequences. This works easily if size is a static integer. It gets a bit more complicated, if size also becomes a layer of shape (B,T) because the size placeholder of the size dimension of the output would then depend on both the sequence and the time step, making it 2-dimensional.
I have already tried out some code in my local repository. My current solution for allowing start to be of shape (B,T) is the following: the layer uses the slice_nd function from returnn.tf.util which I assume we don't want to change as other layers might depend on it. The slice_nd function expects start to be of shape (B,) and size to be an integer. Therefore, in the SliceNdLayer I flatten all axes of start into the batch dimension and repeat the entries of the input as many times as to fit the new shape of start. After slice_nd is finished, the returned slices can be reshaped again to match the size (B,T,size,...). This functionality already works.
I also tried out to allow size to be a layer of shape (B,T). Before passing it to the slice_nd function, I extracted the max size to get an integer. After slice_nd is finished, I padded the extracted slices (which are all of the same size) in order to match the sizes from the size layer. I think this should also work in theory, however, so far I am not sure of how to set the dimension tags such that the size placeholder of the size axis is dependent on both the batch and the time dimension. We anyway have to decide if we want to do this or if we want to return another shape than I proposed.
What do you think @albertz ?