Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Dec 3, 2025

With user_events support added in #115265, this PR looks to test a few end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

  1. Starting EventPipeSessions through DiagnosticClient ❌
    DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).

  2. Using an EventPipeEventSource to validate events streamed through EventPipe ❌
    User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

Each scenario uses the same pattern:

  1. Scenario invokes the shared test runner

    User events scenarios can differ in their tracee logic, the events expected in the .nettrace, the record-trace script used to collect those events, and how long it takes for the tracee to emit them and for record-trace to resolve symbols and write the .nettrace. To handle this variance, UserEventsTestRunner lets each scenario pass in its scenario-specific record-trace script path, the path to its test assembly (used to spawn the tracee process), a validator that checks for the expected events from the tracee, and optional timeouts for both the tracee and record-trace to exit gracefully.

  2. UserEventsTestRunner orchestrates tracing and validation

    Using this configuration, UserEventsTestRunner first checks whether user events are supported. It then starts record-trace with the scenario’s script and launches the tracee process so it can emit events. After the run completes, the runner stops both the tracee and record-trace, opens the resulting .nettrace with EventPipeEventSource, and applies the scenario’s validator to confirm that the expected events were recorded. Finally, it returns an exit code indicating whether the scenario passed or failed.

Dependencies:

  • Environment with a kernel 6.4+, .NET 10, glibc 2.35+
  • Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
  • Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Helix Nuances

UserEvents functional runtime tests differ from other runtime tests because it depends on OneCollect's Record-Trace tool to enable a userevents-based eventpipe session and to collect events. By design, Record-Trace requires elevated privileges, so these tests invoke a record-trace executable with sudo.

When tests run on Helix, test artifacts are stripped of their permissions, so the test infrastructure was modified to give record-trace execute permissions (helix-extra-executables.list). Moreover, to avoid having one copy of record-trace per scenario, which in turn requires re-adding execute permissions for each, more modifications were added to copy over a single record-trace executable that would be used by all scenarios (OutOfProcess marker).

Additionally, in Helix environments, TMPDIR is set to a helix specific temporary directory like /datadisks/disk1/work//t, and at this time, record-trace only scans /tmp/ for the runtime's diagnostic ports. So as a workaround, the tracee apps are spawned with TMPDIR set to /tmp.

Lastly, the job steps to run tests on AzDO prevents restoring individual runtime test projects. Because record-trace is currently only resolvable through the dotnet-diagnostics-tests source, userevents_common.csproj was added to the group of projects restored at the beginning of copying native test components to restore Microsoft.OneCollect.RecordTrace.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive end-to-end functional tests for the .NET user_events feature, which was introduced in PR #115265. The tests validate that .NET Runtime user events can be emitted via EventPipe and collected using Microsoft's record-trace tool from the one-collect project.

Key Changes

  • New shared test infrastructure that orchestrates tracing via record-trace, manages tracee processes, and validates collected events
  • Three test scenarios validating different user_events use cases: runtime events (basic), custom EventSource events (managedevent), and multi-threaded event emission (multithread)
  • Build system integration to restore dependencies, copy executables, and handle Helix-specific requirements like permission restoration

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/tests/tracing/userevents/common/UserEventsTestRunner.cs Shared test orchestration logic for launching record-trace, running tracee processes, and validating traces
src/tests/tracing/userevents/common/UserEventsRequirements.cs Environment validation checking kernel version, glibc version, and user_events availability
src/tests/tracing/userevents/common/userevents_common.csproj Common project configuration with dependencies and record-trace executable deployment
src/tests/tracing/userevents/common/NuGet.config NuGet source configuration for Microsoft.OneCollect.RecordTrace package
src/tests/tracing/userevents/basic/basic.cs Test scenario for runtime AllocationSampled events
src/tests/tracing/userevents/basic/basic.csproj Project configuration for basic scenario
src/tests/tracing/userevents/basic/basic.script record-trace script for collecting runtime events
src/tests/tracing/userevents/managedevent/managedevent.cs Test scenario for custom EventSource events
src/tests/tracing/userevents/managedevent/managedevent.csproj Project configuration for managedevent scenario
src/tests/tracing/userevents/managedevent/managedevent.script record-trace script for collecting custom events
src/tests/tracing/userevents/multithread/multithread.cs Test scenario for concurrent event emission across multiple threads
src/tests/tracing/userevents/multithread/multithread.csproj Project configuration for multithread scenario
src/tests/tracing/userevents/multithread/multithread.script record-trace script for collecting multi-threaded events
src/tests/tracing/userevents/README.md Documentation explaining test structure and approach
src/tests/build.proj Added userevents_common.csproj to restore projects list
src/tests/Common/helixpublishwitharcade.proj Added logic to restore execute permissions for extra test executables on Helix
eng/Versions.props Updated TraceEvent version and added MicrosoftOneCollectRecordTrace version

UserEvents functional runtime tests differ from other runtime tests
because it depends on OneCollect's Record-Trace tool to enable a
userevents-based eventpipe session and to collect events. By design,
Record-Trace requires elevated privileges, so these tests invoke
a record-trace executable with sudo.

When tests run on Helix, test artifacts are stripped of their
permissions, so the test infrastructure was modified to give
record-trace execute permissions (helix-extra-executables.list).
Moreover, to avoid having one copy of record-trace per scenario,
which in turn requires re-adding execute permissions for each,
more modifications were added to copy over a single record-trace
executable that would be used by all scenarios (OutOfProcess marker).

Additionally, in Helix environments, TMPDIR is set to a helix specific
temporary directory like /datadisks/disk1/work/<id>/t, and at this time,
record-trace only scans /tmp/ for the runtime's diagnostic ports. So as
a workaround, the tracee apps are spawned with TMPDIR set to /tmp.

Lastly, the job steps to run tests on AzDO prevents restoring individual
runtime test projects. Because record-trace is currently only resolvable
through the dotnet-diagnostics-tests source, userevents_common.csproj
was added to the group of projects restored at the beginning of copying
native test components to restore Microsoft.OneCollect.RecordTrace.
@mdh1418
Copy link
Member Author

mdh1418 commented Dec 8, 2025

/ba-g "The failure is this unrelated issue #118603"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants