You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original implementation had two critical issues:
- *Functional*: It did not preserve register order, so it was computing
a different reduction.
- *Performance*: When converting back to the original tensor type, it
did: `reshape(convert_layout(res))`. That means the `reshape` operation
served as an anchor and the suboptimal slice layout was propagated.
This was fixed as follows:
- Keep register order.
- Do `convert_layout(reshape(res))` when converting back to the original
type, thus propagating the more optimal layout.
See implementation for further details.
Signed-off-by: victor-eds <[email protected]>
0 commit comments