For long vector machine, e.g. 4K~16K or more VLEN, it always implements the Cray-like architecture.
However, for architecture like this:
- they are always chaining;
- for each load store, memory access operations are split into multiple uops and send multiple transactions into memory subsystem;
Thus it's almost impossible(or the overhead is too high) for them to implement precise exception, or handle PF/AF exceptions.
For example, in the uarch of chipsalliance/t1, our MMU might be:
- add AGU in each lanes, which contains a L1TLB, while provide a L2TLB in Sequencer;
- trigger PF as early as possible, then stall vector pipeline(it's possible for them to flush);
- except the scalar core to handle the PF, after PF is cleared, Vector will continue this vector instruction.
This flow is non-standard, however I still think this issue need to be raised for the RISC-V community to think about how to handle such issues.