-
Notifications
You must be signed in to change notification settings - Fork 12
SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10
Description
Just like an earlier issue discusses address range CMOs vs per-cache-line CMOs... but this time for operations that are typically used for things like "flush the entire I$ or D$".
Such "cache microarchitecture dependent CMOs" have been done in some earlier processors a cache line at a time --- but this is less well established than for peer-cache-line-address-at-a-time. Quite a few RISC processors have "full cache flushes", etc.
First, if operating a cache line at a time, there must be a way of indicating which cache line is involved. Typically this is (set,way), but not all caches have sets and ways - indeed, it is not really clear what the set and ways are for something like a skewed associative cache.
But that's okay, we can abstract that as a "cache entry index number", which might be Set*Nways+Way for a traditional set associative cache, or whatever is appropriate.
Then, a per-cache-index loop typically looks like
FOR i from 0 to #cache_entries-1 DO
CMO.cache_index i
or
FOR s from 0 to Nsets-1 DO
FOR w from 0 to Nways-1 DO
CMO.by_set_way s,w
That's the traditional approaxch.
The draft proposal (by me, Andy Glew, TBD link here3) defines "microarchitecture range CMOs" that look like
x1 := 0
loop:
x1 := CMO.UR x1
BNEZ x1, loop
which looks remarkably like the per-cache-index loop
except that, like in the CMO.AR proposal, the next cache index is returned by the CMO.UR instruction.
This allows severral implementations
(1) per (set,way) cache line at a time - traditional
(2) trap to M-mode efficiently, less overhead
(3) state machines that iterate over the entire cache, e.g. for EVICT, to write out dirty data
also (3.1) non-state machine impl;ementations, as in bulk invalidations that set all valid bits to 0 as a single operation.
I mark this as a SECONDARY QUESTION:
in the title, because I want it to be blaringly obvious
also becausde I am in a hurry, and will apply this issue tracker's priority scheme later
but mainly because I think there will be less discussion about this CMO.UR cache index range than there will be for the CMO.AR address range instruction.
since there are already quite a few implementations that are "full cache invalidations", and we want RISC-V to support such hardware when it is available.
--
again, this issue is not for the details of the CMO.UR. It is mostly for the idea of a midfroarchitwecure or cache index range.