Skip to content

[Diangostics][dotnet-trace] Add collect-linux verb #47894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 184 additions & 16 deletions docs/core/diagnostics/dotnet-trace.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
* Is a cross-platform .NET Core tool.
* Enables the collection of .NET Core traces of a running process without a native profiler.
* Is built on [`EventPipe`](./eventpipe.md) of the .NET Core runtime.
* Delivers the same experience on Windows, Linux, or macOS.
* On Linux, provides additional integration with kernel user_events for native tracing tool compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to tweak the wording a bit later, but not focusing on this for the moment :)


## Options

Expand All @@ -55,15 +55,12 @@

Displays the version of the dotnet-trace utility.

- **`--duration`**

How long to run the trace. `--duration 00:00:00:05` will run it for 5 seconds.

## Commands

| Command |
|-----------------------------------------------------------|
| [dotnet-trace collect](#dotnet-trace-collect) |
| [dotnet-trace collect-linux](#dotnet-trace-collect-linux) |
| [dotnet-trace convert](#dotnet-trace-convert) |
| [dotnet-trace ps](#dotnet-trace-ps) |
| [dotnet-trace list-profiles](#dotnet-trace-list-profiles) |
Expand All @@ -76,16 +73,27 @@
### Synopsis

```dotnetcli
dotnet-trace collect [--buffersize <size>] [--clreventlevel <clreventlevel>] [--clrevents <clrevents>]
dotnet-trace collect
[--buffersize <size>]
[--clreventlevel <clreventlevel>]
[--clrevents <clrevents>]
[--dsrouter <ios|ios-sim|android|android-emu>]
[--format <Chromium|NetTrace|Speedscope>] [-h|--help] [--duration dd:hh:mm:ss]
[-n, --name <name>] [--diagnostic-port] [-o|--output <trace-file-path>] [-p|--process-id <pid>]
[--profile <profile-name>] [--providers <list-of-comma-separated-providers>]
[--format <Chromium|NetTrace|Speedscope>]
[-h|--help]
[--duration dd:hh:mm:ss]
[-n, --name <name>]
[--diagnostic-port]
[-o|--output <trace-file-path>]
[-p|--process-id <pid>]
[--profile <profile-name>]
[--providers <list-of-comma-separated-providers>]
[-- <command>] (for target applications running .NET 5 or later)
[--show-child-io] [--resume-runtime]
[--show-child-io]
[--resume-runtime]
[--stopping-event-provider-name <stoppingEventProviderName>]
[--stopping-event-event-name <stoppingEventEventName>]
[--stopping-event-payload-filter <stoppingEventPayloadFilter>]
[--event-filters <list-of-comma-separated-event-filters>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we avoid adding this right now to keep the scope of changes smaller. I'm guessing it would be only rarely used. My understanding is that most dotnet-trace users are looking for simple configurations with modest excess events rather than complex configurations that capture the minimal possible set of events.

It certainly might be useful to return to this and add it later, but I'd rather not pile up too many new UI features now.

```

### Options
Expand Down Expand Up @@ -158,7 +166,7 @@

- **`--dsrouter {ios|ios-sim|android|android-emu}**

Starts [dotnet-dsrouter](dotnet-dsrouter.md) and connects to it. Requires [dotnet-dsrouter](dotnet-dsrouter.md) to be installed. Run `dotnet-dsrouter -h` for more information.
Starts [dotnet-dsrouter](dotnet-dsrouter.md) and connects to it. Requires [dotnet-dsrouter](dotnet-dsrouter.md) to be installed. Run `dotnet-dsrouter -h` for more information.

- **`--format {Chromium|NetTrace|Speedscope}`**

Expand Down Expand Up @@ -204,11 +212,11 @@

A named pre-defined set of provider configurations that allows common tracing scenarios to be specified succinctly. The following profiles are available:

| Profile | Description |
|---------|-------------|
|`cpu-sampling`|Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.|
|`gc-verbose`|Tracks GC collections and samples object allocations.|
|`gc-collect`|Tracks GC collections only at very low overhead.|
| Profile | Description |
|---------|-------------|
|`cpu-sampling`|Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.|
|`gc-verbose`|Tracks GC collections and samples object allocations.|
|`gc-collect`|Tracks GC collections only at very low overhead.|

- **`--providers <list-of-comma-separated-providers>`**

Expand Down Expand Up @@ -249,6 +257,34 @@

A string, parsed as [payload_field_name]:[payload_field_value] pairs separated by commas, that will stop the trace upon hitting an event containing all specified payload pairs. Requires `--stopping-event-provider-name` and `--stopping-event-event-name` to be set. for example, `--stopping-event-provider-name Microsoft-Windows-DotNETRuntime --stopping-event-event-name Method/JittingStarted --stopping-event-payload-filter MethodNameSpace:Program,MethodName:OnButtonClick` to stop the trace upon the first `Method/JittingStarted` event for the method `OnButtonClick` in the `Program` namespace emitted by the `Microsoft-Windows-DotNETRuntime` event provider.

- **`--event-filters <list-of-comma-separated-event-filters>`**

Defines an additional optional filter for each provider's events. When no `--event-filters` is specified for a provider, all events allowed by the provider's keywords and level configuration are collected. Event filters provide additional granular control beyond the keyword/level filtering.

**Format:** `ProviderName:<Enable>:<EventIds>`

Where:
- `ProviderName`: The EventPipe provider name (e.g., `Microsoft-Windows-DotNETRuntime`)
- `Enable` : Boolean value indicating whether EventIds will be enabled or disabled, defaults to false
- `EventIds`: Plus-delimited event IDs to enable or disable, defaults to empty.

**Examples:**
```

Check failure on line 272 in docs/core/diagnostics/dotnet-trace.md

View workflow job for this annotation

GitHub Actions / lint

Fenced code blocks should be surrounded by blank lines

docs/core/diagnostics/dotnet-trace.md:272 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
# Scenario: Disable specific events from Microsoft-Windows-DotNETRuntime
--event-filters "Microsoft-Windows-DotNETRuntime:false:1+2+3+4+5+6+7+8+9"

# Scenario: Enable specific events from a provider
--event-filters "Microsoft-Windows-DotNETRuntime:true:80+129+130+250"
# Only events 80, 129, 130, and 250 will be collected from this provider (others are filtered out)

# Scenario: Multiple providers with mixed filtering - some providers have no filters
--providers "Microsoft-Windows-DotNETRuntime:0xFFFFFFFF:5,System.Threading.Tasks.TplEventSource:0xFFFFFFFF:5,MyCustomProvider:0xFFFFFFFF:5"
--event-filters "Microsoft-Windows-DotNETRuntime:false:1+2+3,System.Threading.Tasks.TplEventSource:true:7+8+9"
# Microsoft-Windows-DotNETRuntime: All events EXCEPT 1,2,3 are collected
# System.Threading.Tasks.TplEventSource: ONLY events 7,8,9 are collected
# MyCustomProvider: ALL events are collected (no filter specified - follows provider keywords/level)
```

> [!NOTE]

> - Stopping the trace may take a long time (up to minutes) for large applications. The runtime needs to send over the type cache for all managed code that was captured in the trace.
Expand All @@ -259,6 +295,138 @@

> - When specifying a stopping event through the `--stopping-event-*` options, as the EventStream is being parsed asynchronously, there will be some events that pass through between the time a trace event matching the specified stopping event options is parsed and the EventPipeSession is stopped.

## dotnet-trace collect-linux

Collects diagnostic traces from .NET applications using Linux user_events as a transport layer. This command provides the same functionality as [`dotnet-trace collect`](#dotnet-trace-collect) but routes .NET runtime events through the Linux kernel's user_events subsystem before writing them to `.nettrace` files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend the initial technology we refer to should be perf_events and try to explain that:

  • Its a Linux OS technology
  • It supports capturing a variety of events from kernel and user mode
  • It requires admin privileges
  • By default it captures events from all processes

We could mention that the .NET portion of those events are communicated using the user_events feature as a detail in the broader explanation.

This command provides the same functionality as dotnet-trace collect

We probably don't want to say this is the 'same functionality as dotnet-trace collect' because it can do more. Instead we might say it supports including the same .NET events.


This transport approach enables automatic unification of user-space .NET events with kernel-space system events, since both are captured in the same kernel tracing infrastructure. Linux tools like `perf` and `ftrace` can monitor events in real-time while maintaining full compatibility with existing .NET profiling workflows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we ignore 'perf' and 'ftrace', or only mention them as aids in describing perf_events.


### Prerequisites

- Linux kernel with `CONFIG_USER_EVENTS=y` support (kernel 6.4+)
- Appropriate permissions to access `/sys/kernel/tracing/user_events_data`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneCollect will need access to other things in addition to this file. It might be easiest to say 'root permission' unless someone thinks a more precise enumeration will be helpful.

- .NET 10+

### Synopsis

```dotnetcli
dotnet-trace collect-linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can prune away some of these options

  • buffersize - If OneCollect can reasonably pick the size then we may not need an option to explicitly set it.
  • diagnostics-port - I think we'd only need this in some advanced scenarios. Given that we could be profiling multiple processes maybe we'd even need multiple ports? I'd suggest lets leave it out for now and wait to see what happens.
  • resume-runtime - if we don't have diagnostics port then we shouldn't need this.
  • event-filters - I'd suggest we leave this out for now to keep things simpler.
  • tracepoint-configs - can we leave this out? If the user cares about the tracepoint names that imples they are going to use some other tool besides dotnet-trace to record the events. But if that is true I'm not sure why they'd want dotnet-trace to be creating a nettrace file as well?

[--buffersize <size>]
[--clreventlevel <clreventlevel>]
[--clrevents <clrevents>]
[--format <Chromium|NetTrace|Speedscope>]
[-h|--help]
[--duration dd:hh:mm:ss]
[-n, --name <name>]
[--diagnostic-port]
[-o|--output <trace-file-path>]
[-p|--process-id <pid>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the collect verb it is required to specify either the process-id, name, command or dsrouter. I'm guessing that requirement won't exist for collect-linux and not specifying any of them is equivalent to collecting all processes on the machine? Mentioning the options here is what I'd expect, but we'd need to describe the "collect all processes by default" behavior somewhere.

[--profile <profile-name>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to decide what profiles are available and which events they each collect. I expect this is mostly the same as for the 'collect' verb, but cpu-sampling probably should be different. We also might want a thread-time profile that collects context switches.

If we do update these profiles, we should strongly consider also renaming the highly misleading "cpu-sampling" profile for the collect verb. Currently that profile collects thread-time information, not CPU samples.

[--providers <list-of-comma-separated-providers>]
[-- <command>] (for target applications running .NET 10 or later)
[--show-child-io]
[--resume-runtime]
[--stopping-event-provider-name <stoppingEventProviderName>]
[--stopping-event-event-name <stoppingEventEventName>]
[--stopping-event-payload-filter <stoppingEventPayloadFilter>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some challenges with these stopping event options so it may be simplest not to support them for now:

  1. Design-wise: we'd need to decide if they apply to the other non-.NET events too? If no it feels a bit arbitrary why not. If yes its unclear what provider is refering to? We might decide its helpful to change how these options work across both the collect and collect-linux verbs.
  2. Implementation-wise: This requires the events to be analyzed in real time, most likely meaning we'd a similar 'stop on event' feature that can be specified in OneCollect's scripting interface.

[--event-filters <list-of-comma-separated-event-filters>]
[--tracepoint-configs <list-of-comma-separated-tracepoint-configs>]
[--kernel-events <list-of-kernel-events>]
```

### Options

`dotnet-trace collect-linux` supports all the same options as [`dotnet-trace collect`](#dotnet-trace-collect), excluding `--dsrouter`, and additionally offers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wind up having both a set of omissions and a set of additions I'm guessing it will be easier to describe the options explicitly rather than as a relative reference to the collect verb. I think copy-and-paste for options that are the same is completely fine.


- **`--tracepoint-configs <list-of-comma-separated-tracepoint-configs>` (required)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this tracepoint-config option?


Defines the explicit mapping between EventPipe providers and kernel tracepoints. Each provider in `--providers` must have a corresponding entry in `--tracepoint-configs`

**Format:** `ProviderName:<DefaultTracepointName>:<TracepointSets>`

Where:
- `ProviderName`: The EventPipe provider name (e.g., `Microsoft-Windows-DotNETRuntime`)
- `DefaultTracepointName`: Default tracepoint name for this provider (can be empty to require explicit assignment)
- `TracepointSets`: Semi-colon delimited `TracepointName=<EventIds>`
- `EventIds`: Plus-delimited event IDs to route to that tracepoint

> [!NOTE]
> All tracepoint names are automatically prefixed with the provider name to avoid collisions. For example, `gc_events` for the `Microsoft-Windows-DotNETRuntime` provider becomes `Microsoft_Windows_DotNETRuntime_gc_events`.

> [!TIP]
> Use `--event-filters` to disable specific events before they are routed to tracepoints. Event filtering happens before tracepoint routing - only events that pass the filter will be sent to their assigned tracepoints.

**Examples:**
```

Check failure on line 360 in docs/core/diagnostics/dotnet-trace.md

View workflow job for this annotation

GitHub Actions / lint

Fenced code blocks should be surrounded by blank lines

docs/core/diagnostics/dotnet-trace.md:360 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
# Scenario: All events from provider go to a default tracepoint
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime"
# All enabled events from Microsoft-Windows-DotNETRuntime will be written to Microsoft_Windows_DotNETRuntime_dotnet_runtime

# Scenario: Split events by categories
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::gc_events=1+2+3;jit_events=10+11+12"
# EventIDs 1, 2, and 3 will be written to Microsoft_Windows_DotNETRuntime_gc_events
# EventIDs 10, 11, and 12 will be written to Microsoft_Windows_DotNETRuntime_jit_events

# Multiple providers (comma-separated)
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::gc_events=1+2+3,MyCustomProvider:custom_events"
# EventIds 1, 2, and 3 from Microsoft-Windows-DotNETRuntime will be written to Microsoft_Windows_DotNETRuntime_gc_events
# All enabled events from MyCustomProvider will be written to MyCustomProvider_custom_events
```

- **`--kernel-events <list-of-kernel-events>` (optional)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should call this perf-events or some other name? kernel-events to me implies that the kernel is generating the event yet some of these events in this list might be generated by user-mode code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should work with Beau to figure out what kind of event names can be supported. For example https://man7.org/linux/man-pages/man1/perf-record.1.html shows a whole bunch of different things can be specified as an event and I have no idea what part of that OneCollect handles.

I'm hoping we can at least support anything in /sys/kernel/tracing/available_events as well as the symbolic PMU events like cpu-cycles.


A comma-separated list of kernel event categories to include in the trace. These events are automatically grouped into kernel-named tracepoints. Available categories include:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comma-separated list of kernel event categories to include ...

Are these things being enumerated categories, or individual events? They sound like individual events.


| Category | Description | Linux Tracepoints |
|----------|-------------|-------------------|
| `syscalls` | System call entry/exit events | `syscalls:sys_enter_*`, `syscalls:sys_exit_*` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect dotnet-trace to support a user who literally types?

--kernel_events syscalls:sys_enter_*

Or are we expecting the user to substitute the * with the name of a particular call? perf supports wildcards in the input and it would be cool if we could too, but if that makes life too hard for OneCollect it doesn't feel essential.

| `sched` | Process scheduling events | `sched:sched_switch`, `sched:sched_wakeup` |
| `net` | Network-related events | `net:netif_rx`, `net:net_dev_xmit` |
| `fs` | Filesystem I/O events | `ext4:*`, `vfs:*` |
| `mm` | Memory management events | `kmem:*`, `vmscan:*` |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone specified "sched:sched_wakeup,sched:sched_switch" what provider name, event name, and field lists would we expect to show up in the nettrace file? (This level of detail may not go into the docs, but we should understand it ourselves to decide what info should go in the docs)

These events correspond to Linux kernel tracepoints documented in the [Linux kernel tracing documentation](https://www.kernel.org/doc/html/latest/trace/index.html). For more details on available tracepoints, see [ftrace](https://www.kernel.org/doc/html/latest/trace/ftrace.html) and [tracepoints](https://www.kernel.org/doc/html/latest/trace/tracepoints.html).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the ftrace and tracepoints link it wasn't clear there was anything to learn about which events were available. It seemed like it was conceptual information rather than listings of specific events. I think we could omit those links.


Example: `--kernel-events syscalls,sched,net`

### Linux Integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend we drop this part and focus on the user scenario where they specify events and then get those events recorded in a trace file on disk.


**Tracepoint Configuration Requirements:**

- **Mandatory Mapping**: Every provider must be explicitly mapped to at least a default tracepoint and/or exclusive tracepoint sets via `--tracepoint-configs`
- **Tracepoint Isolation**: Each tracepoint can only receive events from one provider
- **Event Routing**: Different event IDs within a provider can be routed to different tracepoints for granular control
- **Automatic Prefixing**: All tracepoint names are prefixed with the provider name to avoid collisions

**Kernel Integration Points:**

The kernel tracepoints can be accessed through standard Linux tracing interfaces:

- **ftrace**: `/sys/kernel/tracing/events/user_events/`
- **perf**: Use `perf list user_events*` to see available events
- **System monitoring tools**: Any tool that can consume Linux tracepoints

### Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower in this doc are some examples of using dotnet-trace collect in various situations. No need just yet, but we'd probably want to update or add to those examples for the collect-linux functionality once we've clarified what it will be.


```dotnetcli
# All runtime events to one tracepoint
dotnet-trace collect-linux --process-id 1234 \
--providers Microsoft-Windows-DotNETRuntime:0x8000:5 \
--kernel-events syscalls,sched \
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime"

# Split runtime events by category
dotnet-trace collect-linux --process-id 1234 \
--providers Microsoft-Windows-DotNETRuntime:0x8001:5 \
--kernel-events syscalls,sched,net,fs \
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::exception_events=80;gc_events=1+2"

# Multiple providers
dotnet-trace collect-linux --process-id 1234 \
--providers "Microsoft-Windows-DotNETRuntime:0x8001:5,MyCustomProvider:0xFFFFFFFF:5" \
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime,MyCustomProvider:custom_events"
```

## dotnet-trace convert

Converts `nettrace` traces to alternate formats for use with alternate trace analysis tools.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ into other formats, such as Chromium or [Speedscope](https://www.speedscope.app/
Trace completed.
```

dotnet-trace uses the [conventional text format](#conventions-for-describing-provider-configuration) for describing provider configuration in
dotnet-trace uses a comma-delimited variant of the [conventional text format](#conventions-for-describing-provider-configuration) for describing provider configuration in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to restructure this so that the part of the doc describing the "conventional text format" just describes a single provider, not the separators between them in a list. Then you could say:

dotnet-trace uses a comma-separated list of providers in the --providers
argument. Each provider is written in the conventional format.

Also this sounds like it could be a separate doc PR? It doesn't appear connected to the new user_events work.

the `--providers` argument. For more options on how to take traces using dotnet-trace, see the
[dotnet-trace docs](./dotnet-trace.md#collect-a-trace-with-dotnet-trace).

Expand Down
Loading