[Wasm RyuJit] Enable native wasm fast tail calls#129134
Conversation
Set FEATURE_FASTTAILCALL=1 and FEATURE_TAILCALL_OPT=1. Fast tail calls lower to return_call / return_call_indirect. Tag the SP arg so codegen adds compLclFrameSize to undo the prolog adjustment, so the callee receives the incoming shadow-stack pointer.
|
@kg PTAL Passes various Pri-0 tail call tests. We emit ~4K tail calls in SPC. Using an LIR flag may raise some hackles; happy to consider alternatives. |
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Enables WebAssembly fast tail calls in CoreCLR RyuJIT and wires up shadow-stack/SP handling so wasm return_call / return_call_indirect can be emitted correctly.
Changes:
- Turn on
FEATURE_FASTTAILCALLandFEATURE_TAILCALL_OPTforTARGET_WASM. - Tag the wasm shadow-stack/SP argument for fast tail calls in RA and adjust it in codegen to undo the prolog’s SP delta.
- Relax a fast-tailcall eligibility check that is stack-based and not applicable to wasm’s local-based argument passing.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/targetwasm.h | Enables fast tail calls + opportunistic tail calls for wasm. |
| src/coreclr/jit/regallocwasm.cpp | Tags the well-known wasm shadow-stack pointer arg for fast tail calls. |
| src/coreclr/jit/morph.cpp | Skips an arg-stack-space constraint that doesn’t apply to wasm. |
| src/coreclr/jit/lir.h | Adds a wasm-specific LIR flag to mark the fast-tailcall SP arg. |
| src/coreclr/jit/codegenwasm.cpp | Emits INS_end for tailcall “jmp epilog” blocks and adjusts SP arg / return type handling for tail calls. |
| #define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers | ||
| #define FEATURE_FASTTAILCALL 0 // Tail calls made as epilog+jmp | ||
| #define FEATURE_TAILCALL_OPT 0 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls. | ||
| #define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp. On wasm the "jmp" is the native return_call / return_call_indirect opcode. |
| if (callNode->IsFastTailCall()) | ||
| { | ||
| CallArg* const spArg = callNode->gtArgs.FindWellKnownArg(WellKnownArg::WasmShadowStackPointer); | ||
| if (spArg != nullptr) | ||
| { | ||
| GenTree* const argNode = spArg->GetNode(); | ||
| assert(argNode != nullptr); | ||
| assert(argNode->OperIs(GT_PHYSREG)); | ||
| assert(argNode->AsPhysReg()->gtSrcReg == m_perFuncletData[m_currentFunclet]->m_spReg); | ||
|
|
||
| argNode->gtLIRFlags |= LIR::Flags::WasmFastTailCallSp; | ||
| } | ||
| } |
| if ((tree->gtLIRFlags & LIR::Flags::WasmFastTailCallSp) != 0) | ||
| { | ||
| // Fast tail call SP arg: undo the prolog SP adjustment (asserts funclet tail calls don't happen). | ||
| assert(m_compiler->funCurrentFuncIdx() == ROOT_FUNC_IDX); | ||
| assert(tree->gtSrcReg == GetStackPointerReg(m_compiler->funCurrentFuncIdx())); | ||
| if (m_compiler->compLclFrameSize != 0) | ||
| { | ||
| GetEmitter()->emitIns_I(INS_I_const, EA_PTRSIZE, m_compiler->compLclFrameSize); | ||
| GetEmitter()->emitIns(INS_I_add); | ||
| } | ||
| } |
| // For a fast tail call wasm requires the callee's result type to match the enclosing | ||
| // function's, so derive it from the caller's signature (call->gtType is TYP_VOID). | ||
| if (params.isJump) | ||
| { | ||
| if (m_compiler->info.compRetBuffArg != BAD_VAR_NUM) | ||
| { | ||
| // The enclosing method returns its struct via a retbuf arg, so the wasm-level | ||
| // return is empty. | ||
| typeStack.Push(CORINFO_WASM_TYPE_VOID); | ||
| } | ||
| else if (m_compiler->info.compRetType == TYP_VOID) | ||
| { | ||
| typeStack.Push(CORINFO_WASM_TYPE_VOID); | ||
| } | ||
| else if (m_compiler->info.compRetType == TYP_STRUCT) | ||
| { | ||
| typeStack.Push( | ||
| m_compiler->info.compCompHnd->getWasmLowering(m_compiler->info.compMethodInfo->args.retTypeClass)); | ||
| } | ||
| else | ||
| { | ||
| // Normalize small ints (bool/byte/short/...). | ||
| typeStack.Push((CorInfoWasmType)emitter::GetWasmValueTypeCode( | ||
| ActualTypeToWasmValueType(m_compiler->info.compRetType))); | ||
| } | ||
| } |
SingleAccretion
left a comment
There was a problem hiding this comment.
What's the benefit of using guaranteed-tailcall return_call for implicit tailcalls?
I was imagining we could use implicit tailcalls for shadow stack only since it has some benefits w.r.t. zero-sized shadow frames.
Not sure what to make of this ... are you saying the underlying engine can do this instead in most cases? Or that's not worth doing in general? |
|
Set FEATURE_FASTTAILCALL=1 and FEATURE_TAILCALL_OPT=1. Fast tail calls lower to return_call / return_call_indirect. Tag the SP arg so codegen adds compLclFrameSize to undo the prolog adjustment, so the callee receives the incoming shadow-stack pointer.