-
Notifications
You must be signed in to change notification settings - Fork 12
FIRST BIG QUESTION: Address Range vs per-Cache-Line CMO instructions #9
Description
In my opinion, this is the first big question that the CMO group needs to answer. Top priority because (a) it is a big source of disagrteement, (b) the J-extension I/D proposal by Derek Williams wants to follow the CMO group, (c) because the decisioon has big implications for code portability, for legacy compatibility with RISC-V systems already built, and for building systems where the CPU IP is developed independently of the bus IP and external cache and memory IP - i.e. for system "mash-ups".
Should we provide traditional RISC cache-line-at-a-time instructions, like POWER DCBF, DCBI, DCBZ, ... Not just RISC, but also CISCs like x86's CLFLUSH.
Basically, of the form CMO <memory address>. However, probably not of the form CMO rs1, Mem[rs2+imm12], because such 2reg+imm formats are quite expensive. If we were to do per-cache-line operations, would probably be of the form CMO rs1:cacheline_address.
Or should we provide "address range" CMO operations?
The draft proposal (by me, Andy Glew - TBD link here) contains a proposal for address range CMOs. Actually, it is a proposal for an instruction that can be implemented in several different ways, as described below. This CMO.ASR.* instruction (AR=address range) is intended to be used in a loop that looks like
x1 := start_address_of range
x2 := end_address_of range
loop:
x1 := CMO.AR x1, x2
BNE rs1, rs2, loop
(This is just an example, although IMHO the best. Other issues will discuss details like [start,end] vs [start,end) vs [start,start+length) vs ... But many iof not most of tye address range proposals have a loop like the above, varying in minor details like BNE vs BLT vs ...)
It can be implmented in different ways
(1) per-cache-line implementations, i.e. the traditional RISC way,
rs1 contains an address in the cache line to be invalidated. an address in the next cache line is returned in rd. (my proposal requires rs1=rd, in order to be restartable after exceptions like page-faults without requiring OS modifications, but that can be tweaked)
(2) trap to M-mode, so that can be emulated on systems where idiosyncratic MMIOs and CSRs invalidate caches that the CPU IP is not aware of;
KEY: the M mode software can perform the entire address range validation, and thus drop overhead than if it had to trap on every cache line or DCBF block
(3) using state machines and block invalidations, i.e,. using microarchitecture techniques that may be more efficient than a cache line at a time.
These can apply the CMO to the entire address region; but if they encounter something like a page-fault, they stop so the OS can handle it. i.e. they are restartable.
it is not the purpose of this issue to discuss all of the details about which register operand encodes which values, or whether the loop closing test should be a BNE or a BLT, or whether the and address should be inclusive or exclusive. those undoubtedly will be subsequent issues
this issue is mainly for the overall question: should be RISC-V CMOs be traditional per cache line operations or should they be address ranges using the approach above that allows per cache line implementations