Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Nov 3, 2025

With user_events support added in #115265, this PR looks to test a basic end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

  1. Starting EventPipeSessions through DiagnosticClient ❌
    DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).

  2. Using an EventPipeEventSource to validate events streamed through EventPipe ❌
    User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

  1. Start Tracee app
  2. Start tracing with RecordTrace + dotnet-common profile script
  3. Stop RecordTrace (triggers .nettrace generation) and Tracee app
  4. Validate the .nettrace for particular events from Tracee app

Dependencies:

  • CI runs the runtime test in an environment that supports user_events
  • CI runs the runtime test with permissions to access user_events_data.
  • Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
  • Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new test for UserEvents tracing on Linux that validates the runtime's ability to emit trace events through the user_events subsystem. The test uses the Microsoft.OneCollect.RecordTrace tool to capture events from a tracee process and validates that GC events were properly recorded.

Key changes include:

  • Addition of a new test infrastructure for UserEvents tracing
  • Upgrade of TraceEvent library from version 3.1.16 to 3.1.28
  • Implementation of multi-process test orchestration with native signal handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tests/tracing/eventpipe/userevents/usereventstracee.cs Implements tracee process that generates GC events for validation
src/tests/tracing/eventpipe/userevents/userevents.csproj Project configuration including NuGet package references and build targets
src/tests/tracing/eventpipe/userevents/userevents.cs Main test orchestration: spawns processes, collects traces, validates events
src/tests/tracing/eventpipe/userevents/dotnet-common.script Configuration script for record-trace tool specifying provider and flags
eng/Versions.props Updates TraceEvent package version to 3.1.28

@jkotas
Copy link
Member

jkotas commented Nov 3, 2025

a basic end-to-end user_events scenario

I like this approach.

@mdh1418
Copy link
Member Author

mdh1418 commented Nov 26, 2025

Looks like the reason the .NET runtime events aren't being captured in the .nettrace is because a session isn't actually being started.
On helix machines, the diagnostic port is created under helix's provisioned environment's tempdirectory which is of the form /datadisks/disk1/work/<workID>/t/. RecordTrace currently only scans /tmp/ for these diagnostic ports. I'm planning on adding a config value for eventpipe/userevents debugging for more stresslogs for better diagnostics on whether the point of failure is in the runtime side or external.

Even after switching to just enabling GCKeyword,
the 1s tracee app had a 1% failure rate locally.
On the other hand, AllocationSampled had no
failures after 1000 runs.
@mdh1418 mdh1418 force-pushed the user_events_functional_runtime_test branch from eb14e69 to 368b499 Compare November 26, 2025 23:20
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 2, 2025

I'm planning to open another cleaner PR featuring a reusable UserEvents TestRunner allowing quick extension to different scenarios (this thread). Plus this PR got a bit noisy with issue references due to testing commits to get the userevents tests working on Helix.

@mdh1418 mdh1418 added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Dec 2, 2025
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 3, 2025

Closing in favor of #122134, which has a cleaner commit history and less noisy (for now) discussion thread (from all of the helix test commits)

@mdh1418 mdh1418 closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-Tracing-coreclr NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants