-
Notifications
You must be signed in to change notification settings - Fork 133
Description
The logic involving get_for_batch_ctx
and get_same_base
is ugly, unintuitive, and a potential source of problems.
One big problem is that we never really defined the logic around get_for_batch_ctx
well, and it was added only at a later point. And similarly for some of the other concepts. E.g. the equality of dim tags (#634) is also not well defined.
CumConcatLayer
(#589) and generalized self-attention (#391) were one of the main reasons which introduced this, but then also the beam search logic.
CumConcatLayer
also introduced the concept of implicit dims.
Defining a dim tag (sequence lengths) inside a loop, and then having it accumulated outside is still somewhat straightforward and non-ambiguous, so this is not really a problem.
Maybe it was a problem to treat them as the same though? But I think this is important such that the rec loop optimization (move out of loop) works correctly.
Note also that the whole logic around get_for_batch_ctx
is basically just for having different dyn_size_ext
(dyn_size
) for the same dim tag, under different conditions, such as inside a loop or outside, and with beam or without.
Direct assignments or reads from dyn_size
or dyn_size_ext
, but also all the other flags, even description
, depends on get_for_batch_ctx
or get_same_base
.
Some older code ignores get_for_batch_ctx
or get_same_base
and directly accesses (reads or writes) any of the other flags like dyn_size
. Which works when it is the correct instance. But otherwise it can lead to unexpected behavior.
Related is also declare_same_as
, although this might not be much a problem, even less after such refactoring. However, its logic currently is quite complicated, and should be simplified.
I'm also not sure about a good way to solve this. Maybe dyn_size
and dyn_size_ext
should be hidden away, and only be accessible through functions get_...
and set_...
, which would also require the batch
and ctx
.
Another big problem is the special role of the batch dim (#920), and all the extra logic around BatchInfo
, and when Data.batch
or Dim.batch
should be set, and what it actually should be set to. In principle, we should not need any special logic for the batch dim, and it should be treated just like other dims.
One feature which is available for the batch dim (BatchInfo
) but not for other dims is the flattening logic, and also the special beam search logic.
I think the flattening should be done in a way that you could combine multiple dims (but any dims) and you would get a new flattened dim. Nothing would be specific about the batch dim. There would be meta info attached to the combined dim to be able to recover the individual dims.
The beam should just be another separate dim, not merged into the batch dim. And then there could also be meta information attached to it, what we basically have in SearchBeam
.
Related is also the definition of dim tag equality (#634). This is still not well defined in all cases.
A bit related is also dim tag math, and esp questions on equality in those cases. However, I think this was not too much of a problem so far, except that the equality logic was also broken in its own ways in those cases.
Further, there is also the derived_from_tag
and derived_from_op
logic, which is yet another heuristic for certain equality matching. Maybe this is not needed when dim tags are used everywhere consistently.
And there is also is_dim_known
, undefined
, which are also not so well defined.
Such changes might break some older code. But on RETURNN side, this can all be fixed. And there should not be much (or any) external code yet using this.
Some examples of issues and resulting PRs caused due to the API of Dim
, where things were not really well defined:
CombineLayer
and others (Data.get_common_data
) should usebroadcast_matches=False
#666- Dim tags matching should use
allow_same_spatial_dim=False
#865 - RepeatLayer: set dyn_size_ext for out_dim #1046
- Dim tag derived_from_tag/same_as cycle due to conv+padding #1054 and Fix dim tag derived_from_tag cycle #1055
- Flatten with two losses, missing batch dmi #1057 and Fix adding batch dim #1058
- Missing Batch Info during Returnn Network construction via NNetConstructor #1069 and Batch info handling fixes #1068
- Input Data Size placeholder missing when using reduce and repeat #1102 and Dim tag declare_same_as fix #1104
- Layer output, always set batch info #1107
- Dim, set batch if not set #1112
- Dim declare_same_as, prioritize explicit defined #1114
- Dim declare_same_as fix is_dim_known logic #1151
- Dim declare_same_as fix derived_from_op cycle #1152
- Dim get_for_batch_ctx AssertionError: same_base.batch == same_base.dyn_size_ext.batch #1167 and Fix some Dim dyn_size_ext batch error #1168
- require dim tags to be defined #1246
Related issues: