-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
The motivation for this issue is to provide better support for RVV unit-strided segment load/store.
The following scenarios need to be supported:
- Interleaved load (vp.load + interleave)
- Interleaved load with tail gaps (Requires scalar epilogue to run the last iteration)
- Fully interleaved store (deinterleave + vp.store)
- Interleaved store with gaps (This can not emit unit-strided segment store. We can only emit a wide masked store for that)
Due to the high complexity of VPInterleaveRecipe::execute(), creating a new recipe or converting it into VPWidenIntrinsicRecipe does not seem like a wise approach.
A tentative approach I have in mind is to first split VPInterleaveRecipe into VPWidenLoad + VPDeinterleave and VPInterleave + VPWidenStore. During the EVL lowering phase, we would only need to transform VPWidenLoad/VPWidenStore into VPWidenLoadEVL/VPWidenStoreEVL.
For now, the focus will be on supporting factor 2 (interleave2/deinterleave2) as the initial target, with support for factors 3 to 8 planned after test results are stable.
Related IAP support: #120490 .