[+] Feat: Support dynamic attach #542
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support dynamic attach, use thread_scheduling as example for test. We can do something to adapt other examples.
Also some changes is just re-format.
Key technical changes are shown following:
1. CUDA late-attach bootstrap (fatbin/PTX recovery after missed registration)
__cudaRegisterFatBinary/__cudaRegisterFunctionmay be missed. The runtime then lacks fatbin/PTX material and “host stub → kernel” metadata to route launches through patched code, resulting in “no data”.nv_attach_implis initialized, a one-time bootstrap scans already-loaded ELF objects:dl_iterate_phdr;.nv_fatbinin-memory and walk fatbin wrapper(s);kernel_name → patched CUfunctioncache for launch-time routing.2. Launch routing with late-attach fallback
cudaLaunchKernelinterception path no longer strictly depends on the registration-timefunc_ptr → symbol_namemapping.kernel_name → CUfunctionmapping, preserving a safe fallback to the original runtime launch path if patched launch fails.3. Shared-memory session/epoch protocol (control-plane/data-plane consistency)
epoch_seqinbpftime_maps_shmwith seqlock semantics:session_id = epoch_seq / 2.epoch_seqand clears handlers at session start. The agent observes epoch changes and performs an ordered rebind:4. Single-agent control plane (avoid multi-copy state splits)
refresh/detach/statuswhenever possible; injection becomes a fallback when IPC is not available.