Skip to content

Conversation

@Sy0307
Copy link
Collaborator

@Sy0307 Sy0307 commented Jan 25, 2026

Support dynamic attach, use thread_scheduling as example for test. We can do something to adapt other examples.

Also some changes is just re-format.

Key technical changes are shown following:

1. CUDA late-attach bootstrap (fatbin/PTX recovery after missed registration)

  • Problem: When the agent is injected after CUDA registration has already happened, hooks like __cudaRegisterFatBinary/__cudaRegisterFunction may be missed. The runtime then lacks fatbin/PTX material and “host stub → kernel” metadata to route launches through patched code, resulting in “no data”.
  • Solution: After nv_attach_impl is initialized, a one-time bootstrap scans already-loaded ELF objects:
    • Enumerate loaded modules with dl_iterate_phdr;
    • Locate .nv_fatbin in-memory and walk fatbin wrapper(s);
    • Extract PTX, apply the existing ptxpass patching pipeline, compile, and load patched modules into the driver;
    • Pre-fill a kernel_name → patched CUfunction cache for launch-time routing.

2. Launch routing with late-attach fallback

  • The cudaLaunchKernel interception path no longer strictly depends on the registration-time func_ptr → symbol_name mapping.
  • When the canonical mapping is missing, the hook attempts to resolve the host stub symbol name (dladdr + ELF symbol cache) and dispatches via the cached kernel_name → CUfunction mapping, preserving a safe fallback to the original runtime launch path if patched launch fails.

3. Shared-memory session/epoch protocol (control-plane/data-plane consistency)

  • Root cause of repeat-trace instability (“No data”, wrong data, random crashes): control-plane state in shm (handlers/maps/links) changes, while the injected target’s data-plane state (CUDA IPC pointers, device-side globals, patched module state) still points to the previous snapshot.
  • Mechanism: This PR introduces epoch_seq in bpftime_maps_shm with seqlock semantics:
    • Odd: Server is mutating/resetting the snapshot;
    • Even: Stable snapshot; session_id = epoch_seq / 2.
  • Process: The server advances epoch_seq and clears handlers at session start. The agent observes epoch changes and performs an ordered rebind:
    • Detach existing links → clear instantiated bookkeeping → re-instantiate from the new stable shm snapshot.

4. Single-agent control plane (avoid multi-copy state splits)

  • Issue: Repeated tracing previously could re-inject the agent and accidentally create multiple in-process agent instances, splitting state and making failures hard to diagnose.
  • Solution: This PR adds a per-process agent control endpoint and uses IPC for refresh/detach/status whenever possible; injection becomes a fallback when IPC is not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant