Skip to content

Merge main-next to main#226

Draft
umanwizard wants to merge 510 commits intomainfrom
main-next
Draft

Merge main-next to main#226
umanwizard wants to merge 510 commits intomainfrom
main-next

Conversation

@umanwizard
Copy link
Collaborator

This rebases the delta in our fork of opentelemetry-ebpf-profiler on top of upstream main. It will eventually become our main once we let it bake for a while to gain confidence in its correctness.

This PR replaces #209, which had the draft branch name btv/merges.

renovate bot and others added 30 commits December 1, 2025 09:21
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…metry#999)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…#1001)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…2549 (open-telemetry#1003)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…metry#1007)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…#1005)

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Florian Lehner <florian.lehner@elastic.co>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
….6 (open-telemetry#1024)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…metry#1023)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…1029)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…n-telemetry#1030)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
umanwizard and others added 21 commits March 4, 2026 17:06
In order to reduce bpf overhead send through up to 128 kernel launch
timing activities to the usdt probe.  The old single shot
kernel_executed probe is still supported.

Inline correlation and kernel_exec into cuda_probe, tail-call only
activity_batch

The unwinder is sensitive to tail calls, so minimize them: inline
cuda_correlation and cuda_kernel_exec directly into cuda_probe's switch
statement using bpf_usdt_arg() for USDT arg reading.
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Uncomment GPU tests

Just uncomment the tests that were commented out in 42fef97.
These don't compile yet due to API changes.

* Update GPU tests for current APIs

- Rewrite eBPF-level hash tests (computeTraceHash/makeCUDATrace) to use
  libpf.Trace + traceutil.HashTrace since support.Frame/Stack_len/
  ZeroPerSampleFields no longer exist
- Replace Frame.FileID with Frame.Mapping (FrameMapping)
- Update CustomLabels access: string keys → libpf.String (libpf.Intern)
- Drop TestCUDATraceHashExcludesPerSampleFields (per-sample fields now
  live in TraceEventMeta, not libpf.Trace, so they never affect hash)
- Remove unused imports (host, xxh3, unsafe)

* Move interceptor logic into gpu package, avoid allocations, add finishTrace callback

Pass a finishTrace callback through TraceInterceptor so the gpu package
can report completed traces directly. Move CUDA frame scanning and ID
extraction from parcagpu into gpu.InterceptTrace, making the parcagpu
interceptor a thin origin check + delegation.

Split addTrace into addSingleTrace and addGraphTrace:
- addSingleTrace returns a single (CudaTraceOutput, bool) on the stack —
  no slice allocation, no SymbolizedCudaTrace when timing already arrived.
- addGraphTrace returns []CudaTraceOutput and always stores for future
  timing events.
InterceptTrace dispatches based on isGraphLaunch and calls finishTrace
outside the fixer lock.

Move HashTrace out of prepTrace. Each reporting path hashes once at the
point of reporting: finishTrace for intercepted traces, inline for the
normal HandleTrace path, and processBatch for the AddTimes path.

finishTrace does not call maybeNotifyAPMAgent: intercepted traces may
also be completed on a different path that lacks bpfTrace context for
APM notification, so we skip it for consistency.

* Review fixes

* one more

* simplify tests
@umanwizard umanwizard requested review from brancz and gnurizen March 5, 2026 22:52
@umanwizard umanwizard marked this pull request as ready for review March 5, 2026 22:53
@umanwizard umanwizard marked this pull request as draft March 5, 2026 22:53
@umanwizard
Copy link
Collaborator Author

Not sure why CI isn't running here.

@umanwizard
Copy link
Collaborator Author

oh, turns out there's just an outage in actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.