-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Description
Hi, I am Yifan Zhang from the ChatDV group at Southeast University. I am currently working on integrating a custom CIM accelerator with Vanadis using the RoCC interface for our research.
During performance modeling, I identified two critical limitations in the current VanadisRoCCInterface integration that prevent true Out-of-Order (OoO) execution, effectively forcing RoCC instructions to execute sequentially or stall the pipeline unnecessarily.
Current Limitations
-
Early Source Operand Evaluation (RAW Dependency Issue):
Currently,assignRegistersToInstructionreads the values of source registers (rs1,rs2) immediately during the Issue/Rename stage and pushes them to the RoCC interface.- Problem: In a realistic OoO core, the values might not be ready yet (e.g., pending a previous ALU or Load instruction). Reading them at the Issue stage forces the pipeline to wait until the producer instruction retires or forces the RoCC instruction to execute with stale data if not handled correctly. It bypasses the standard reservation station/waiting logic.
Here is the problematic logic in assignRegistersToInstruction. It reads register values immediately, bypassing the reservation station/dependency check mechanism:
// In VANADIS_COMPONENT::assignRegistersToInstruction
if (ins->getInstFuncType() >= INST_ROCC0 && ins->getInstFuncType() <= INST_ROCC3) {
// ...
if (!roccs_[rocc_index]->RoCCFull()) {
VanadisRegisterFile* regFile = register_files[ins->getHWThread()];
// PROBLEM: Values are read immediately at the Issue/Rename stage!
// These physical registers might be waiting for a producer (RAW dependency).
uint64_t rs1_val = regFile->getIntReg<int64_t>(ins->getPhysIntRegIn(0));
uint64_t rs2_val = regFile->getIntReg<int64_t>(ins->getPhysIntRegIn(1));
// ...
// Instruction is pushed to accelerator before operands are actually ready
roccs_[rocc_index]->push(new RoCCCommand(rocc_inst, rs1_val, rs2_val));
}
}-
Lack of Destination Register Renaming (WAW/WAR Dependency Issue):
RoCC instructions currently write back results directly to the architectural register (rd) index upon completion, without allocating a new physical register from the free list during the Rename stage.- Problem: This re-introduces Write-After-Write (WAW) and Write-After-Read (WAR) hazards, causing unnecessary serialization and preventing the CPU from overlapping independent instructions with long-latency accelerator tasks.
And here is the write-back logic in performExecute. It writes directly to resp->rd (architectural index) instead of a mapped physical register:
// In VANADIS_COMPONENT::performExecute
// Tick the RoCC Interfaces
for (int i = 0; i < roccs_.size(); i++) {
RoCCResponse* resp;
if (!(roccs_[i]->isBusy()) && (resp = roccs_[i]->respond())) {
VanadisInstruction* ins = rocc_queues_[i].front();
// PROBLEM: Writing to architectural register 'rd' directly.
// This ignores register renaming and causes WAW/WAR hazards.
register_files[ins->getHWThread()]->setIntReg<uint64_t>(resp->rd, resp->rd_val);
ins->markExecuted();
rocc_queues_[i].pop_front();
}
// ...
}Proposed Solution
I have prototyped and verified a fix locally that aligns RoCC instruction handling with standard Vanadis functional units. The proposed changes are:
-
Implement Register Renaming for RoCC:
ModifyassignRegistersToInstructionto treat RoCC instructions like standard integer instructions. Allocate a physical register forrdfrom theint_register_stackand update the ISA table. This resolves false dependencies. -
Delayed Dispatch Mechanism:
- Remove the immediate
pushto the RoCC interface from the Issue stage. - Introduce a
rocc_wait_queue(similar to a reservation station) to hold issued RoCC instructions.
- Remove the immediate
-
Operand Readiness Check in Execute Stage:
ModifyperformExecuteto check therocc_wait_queue. An instruction should only be dispatched (pushed) to the RoCC interface when:- The RoCC command queue is not full.
- Crucially: All source physical registers are ready (checking
pendingIntWrites). - Once dispatched, the register values are read from the register file at this correct moment.
-
Correct Write-back:
Update the response handling logic to write the result into the allocated physical register instead of the architectural register.
Impact
These changes allow the Vanadis CPU to continue issuing and executing independent instructions while a RoCC instruction is waiting for operands or executing a long-latency task. This is essential for accurate performance modeling of heterogeneous systems (e.g., CPU + Accelerator).
Status
I have implemented these changes in a local fork and verified them with a custom CIM component. I plan to clean up the code (separating it from my specific component logic) and submit a Pull Request in the near future. I wanted to open this issue to track the feature and gather any feedback on the design approach.