SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range"

Just like an earlier issue discusses address range CMOs vs per-cache-line CMOs... but this time for operations that are typically used for things like "flush the entire I$ or D$".

Such "cache microarchitecture dependent CMOs" have been done in some earlier processors a cache line at a time  --- but this is less well established than for peer-cache-line-address-at-a-time. Quite a few RISC processors have "full cache flushes", etc.

First, if operating a cache line at a time, there must be a way of indicating which cache line is involved.  Typically this is (set,way), but not all caches have sets and ways - indeed, it is not really clear what the set and ways are for something like a skewed associative cache.

But that's okay, we can abstract that as a "cache entry index number", which might be Set*Nways+Way for a traditional set associative cache, or whatever is appropriate.

Then, a per-cache-index loop typically looks like

    FOR i from 0 to  #cache_entries-1 DO
         CMO.cache_index  i

or

    FOR s from 0 to  Nsets-1 DO
    FOR w from 0 to Nways-1 DO
         CMO.by_set_way  s,w

That's the traditional approaxch.


The draft proposal (by me, Andy Glew, TBD link here3) defines "microarchitecture range CMOs" that look like

            x1 := 0
    loop:
            x1 := CMO.UR x1
            BNEZ x1, loop

which looks remarkably like the per-cache-index loop

except that, like in the CMO.AR proposal, the next cache index is returned by the CMO.UR instruction.




This allows severral implementations

(1) per (set,way) cache line at a time - traditional

(2) trap to M-mode efficiently, less overhead

(3) state machines that iterate over the entire cache, e.g. for EVICT, to write out dirty data

also (3.1) non-state machine impl;ementations, as in bulk invalidations that set all valid bits to 0 as a single operation.



---

I mark this as a SECONDARY QUESTION: 

in the title, because I want it to be blaringly obvious

also becausde I am in a hurry, and will apply this issue tracker's priority scheme later

but mainly because I think there will be less discussion about this CMO.UR cache index range than there will be for the CMO.AR address range instruction.

since there are already quite a few implementations that are "full cache invalidations", and we want RISC-V to support such hardware when it is available.

--

again, this issue is not for the details of the CMO.UR.  It is mostly for the idea of a midfroarchitwecure or cache index range.







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions