Skip to content

Vanadis - Enable true Out-of-Order execution for RoCC instructions by fixing register renaming and dispatch logic #2602

@yuebanabn

Description

@yuebanabn

Description
Hi, I am Yifan Zhang from the ChatDV group at Southeast University. I am currently working on integrating a custom CIM accelerator with Vanadis using the RoCC interface for our research.

During performance modeling, I identified two critical limitations in the current VanadisRoCCInterface integration that prevent true Out-of-Order (OoO) execution, effectively forcing RoCC instructions to execute sequentially or stall the pipeline unnecessarily.

Current Limitations

  1. Early Source Operand Evaluation (RAW Dependency Issue):
    Currently, assignRegistersToInstruction reads the values of source registers (rs1, rs2) immediately during the Issue/Rename stage and pushes them to the RoCC interface.

    • Problem: In a realistic OoO core, the values might not be ready yet (e.g., pending a previous ALU or Load instruction). Reading them at the Issue stage forces the pipeline to wait until the producer instruction retires or forces the RoCC instruction to execute with stale data if not handled correctly. It bypasses the standard reservation station/waiting logic.

    Here is the problematic logic in assignRegistersToInstruction. It reads register values immediately, bypassing the reservation station/dependency check mechanism:

// In VANADIS_COMPONENT::assignRegistersToInstruction

if (ins->getInstFuncType() >= INST_ROCC0 && ins->getInstFuncType() <= INST_ROCC3) {
    // ...
    if (!roccs_[rocc_index]->RoCCFull()) {
        VanadisRegisterFile* regFile = register_files[ins->getHWThread()];
        
        // PROBLEM: Values are read immediately at the Issue/Rename stage!
        // These physical registers might be waiting for a producer (RAW dependency).
        uint64_t rs1_val = regFile->getIntReg<int64_t>(ins->getPhysIntRegIn(0));
        uint64_t rs2_val = regFile->getIntReg<int64_t>(ins->getPhysIntRegIn(1));

        // ...
        // Instruction is pushed to accelerator before operands are actually ready
        roccs_[rocc_index]->push(new RoCCCommand(rocc_inst, rs1_val, rs2_val));
    }
}
  1. Lack of Destination Register Renaming (WAW/WAR Dependency Issue):
    RoCC instructions currently write back results directly to the architectural register (rd) index upon completion, without allocating a new physical register from the free list during the Rename stage.

    • Problem: This re-introduces Write-After-Write (WAW) and Write-After-Read (WAR) hazards, causing unnecessary serialization and preventing the CPU from overlapping independent instructions with long-latency accelerator tasks.

    And here is the write-back logic in performExecute. It writes directly to resp->rd (architectural index) instead of a mapped physical register:

// In VANADIS_COMPONENT::performExecute

// Tick the RoCC Interfaces
for (int i = 0; i < roccs_.size(); i++) {
    RoCCResponse* resp;
    if (!(roccs_[i]->isBusy()) && (resp = roccs_[i]->respond())) {
        
        VanadisInstruction* ins = rocc_queues_[i].front();
        
        // PROBLEM: Writing to architectural register 'rd' directly.
        // This ignores register renaming and causes WAW/WAR hazards.
        register_files[ins->getHWThread()]->setIntReg<uint64_t>(resp->rd, resp->rd_val);
        
        ins->markExecuted();
        rocc_queues_[i].pop_front();
    } 
    // ...
}

Proposed Solution

I have prototyped and verified a fix locally that aligns RoCC instruction handling with standard Vanadis functional units. The proposed changes are:

  1. Implement Register Renaming for RoCC:
    Modify assignRegistersToInstruction to treat RoCC instructions like standard integer instructions. Allocate a physical register for rd from the int_register_stack and update the ISA table. This resolves false dependencies.

  2. Delayed Dispatch Mechanism:

    • Remove the immediate push to the RoCC interface from the Issue stage.
    • Introduce a rocc_wait_queue (similar to a reservation station) to hold issued RoCC instructions.
  3. Operand Readiness Check in Execute Stage:
    Modify performExecute to check the rocc_wait_queue. An instruction should only be dispatched (pushed) to the RoCC interface when:

    • The RoCC command queue is not full.
    • Crucially: All source physical registers are ready (checking pendingIntWrites).
    • Once dispatched, the register values are read from the register file at this correct moment.
  4. Correct Write-back:
    Update the response handling logic to write the result into the allocated physical register instead of the architectural register.

Impact
These changes allow the Vanadis CPU to continue issuing and executing independent instructions while a RoCC instruction is waiting for operands or executing a long-latency task. This is essential for accurate performance modeling of heterogeneous systems (e.g., CPU + Accelerator).

Status
I have implemented these changes in a local fork and verified them with a custom CIM component. I plan to clean up the code (separating it from my specific component logic) and submit a Pull Request in the near future. I wanted to open this issue to track the feature and gather any feedback on the design approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions