You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AMD] Use permlanex16 for shuffleXor on rdna (#7269)
On RDNA, permlanex16 works similar to DPP operations, but has more
flexible lane selection. Each lane in the upper/lower block of 16
contiguous lanes can select an arbitrary lane in the other block to read
from. With 4-bits per lane, we construct the identity mapping
0xfedcba9876543210 so that lane i in the upper 16 lanes reads data from
lane i in the lower 16 lanes and vice versa.
This does not require a round trip to LDS, as was necessary with the
previously used ds_swizzle instruction.
Co-authored-by: Paul Trojahn <[email protected]>
0 commit comments