-
Notifications
You must be signed in to change notification settings - Fork 133
Description
A bit more meta: With all our logic for dim tags, which should actually make it easier to avoid any reshape problems or other shaping problems, why do we still frequently run into such things? The answer is probably too much unnecessary complexity and thus bugs in some parts. But which parts really? What can we remove from it? How can we improve this situation?
Related: rwth-i6/returnn#975
To answer this question: The problem here was that we actually did manual dim math. In LearnedRelativePositionalEncoding, we had:
out_spatial_dim = spatial_dim - 1 + spatial_dim
...
remaining_dim = spatial_dim - self.clipping
...
cond.true, out_spatial_dim_ = nn.concat(
(left, remaining_dim),
(self.pos_emb, self.clipped_spatial_dim),
(right, remaining_dim))
out_spatial_dim_.declare_same_as(out_spatial_dim)And:
self.clipped_spatial_dim = nn.SpatialDim(
f"{nn.NameCtx.current_ctx().get_abs_name()}:learned-rel-pos",
dimension=2 * clipping + 1)I.e.:
out_spatial_dim_
== spatial_dim - clipping + 2 * clipping + 1 + spatial_dim - clipping
== 2 * spatial_dim + 1
!= 2 * spatial_dim - 1
== out_spatial_dim
But the declare_same_as anyway just overwrote this.
Maybe in this case, we could have detected this statically. But in the general case, there are always cases where we can not detect this at compilation time, and only at runtime.
At some point, we planned to actually add such runtime checks for declare_same_as. Maybe this is a reminder. This would have allowed to much more easily detect this problem.
Originally posted by @albertz in rwth-i6/returnn_common#238 (comment)