-
Notifications
You must be signed in to change notification settings - Fork 6k
[Diangostics][dotnet-trace] Add collect-linux verb #47894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,7 +43,7 @@ | |
* Is a cross-platform .NET Core tool. | ||
* Enables the collection of .NET Core traces of a running process without a native profiler. | ||
* Is built on [`EventPipe`](./eventpipe.md) of the .NET Core runtime. | ||
* Delivers the same experience on Windows, Linux, or macOS. | ||
* On Linux, provides additional integration with kernel user_events for native tracing tool compatibility. | ||
|
||
## Options | ||
|
||
|
@@ -55,15 +55,12 @@ | |
|
||
Displays the version of the dotnet-trace utility. | ||
|
||
- **`--duration`** | ||
|
||
How long to run the trace. `--duration 00:00:00:05` will run it for 5 seconds. | ||
|
||
## Commands | ||
|
||
| Command | | ||
|-----------------------------------------------------------| | ||
| [dotnet-trace collect](#dotnet-trace-collect) | | ||
| [dotnet-trace collect-linux](#dotnet-trace-collect-linux) | | ||
| [dotnet-trace convert](#dotnet-trace-convert) | | ||
| [dotnet-trace ps](#dotnet-trace-ps) | | ||
| [dotnet-trace list-profiles](#dotnet-trace-list-profiles) | | ||
|
@@ -76,16 +73,27 @@ | |
### Synopsis | ||
|
||
```dotnetcli | ||
dotnet-trace collect [--buffersize <size>] [--clreventlevel <clreventlevel>] [--clrevents <clrevents>] | ||
dotnet-trace collect | ||
[--buffersize <size>] | ||
[--clreventlevel <clreventlevel>] | ||
[--clrevents <clrevents>] | ||
[--dsrouter <ios|ios-sim|android|android-emu>] | ||
[--format <Chromium|NetTrace|Speedscope>] [-h|--help] [--duration dd:hh:mm:ss] | ||
[-n, --name <name>] [--diagnostic-port] [-o|--output <trace-file-path>] [-p|--process-id <pid>] | ||
[--profile <profile-name>] [--providers <list-of-comma-separated-providers>] | ||
[--format <Chromium|NetTrace|Speedscope>] | ||
[-h|--help] | ||
[--duration dd:hh:mm:ss] | ||
[-n, --name <name>] | ||
[--diagnostic-port] | ||
[-o|--output <trace-file-path>] | ||
[-p|--process-id <pid>] | ||
[--profile <profile-name>] | ||
[--providers <list-of-comma-separated-providers>] | ||
[-- <command>] (for target applications running .NET 5 or later) | ||
[--show-child-io] [--resume-runtime] | ||
[--show-child-io] | ||
[--resume-runtime] | ||
[--stopping-event-provider-name <stoppingEventProviderName>] | ||
[--stopping-event-event-name <stoppingEventEventName>] | ||
[--stopping-event-payload-filter <stoppingEventPayloadFilter>] | ||
[--event-filters <list-of-comma-separated-event-filters>] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest we avoid adding this right now to keep the scope of changes smaller. I'm guessing it would be only rarely used. My understanding is that most dotnet-trace users are looking for simple configurations with modest excess events rather than complex configurations that capture the minimal possible set of events. It certainly might be useful to return to this and add it later, but I'd rather not pile up too many new UI features now. |
||
``` | ||
|
||
### Options | ||
|
@@ -158,7 +166,7 @@ | |
|
||
- **`--dsrouter {ios|ios-sim|android|android-emu}** | ||
|
||
Starts [dotnet-dsrouter](dotnet-dsrouter.md) and connects to it. Requires [dotnet-dsrouter](dotnet-dsrouter.md) to be installed. Run `dotnet-dsrouter -h` for more information. | ||
Starts [dotnet-dsrouter](dotnet-dsrouter.md) and connects to it. Requires [dotnet-dsrouter](dotnet-dsrouter.md) to be installed. Run `dotnet-dsrouter -h` for more information. | ||
|
||
- **`--format {Chromium|NetTrace|Speedscope}`** | ||
|
||
|
@@ -204,11 +212,11 @@ | |
|
||
A named pre-defined set of provider configurations that allows common tracing scenarios to be specified succinctly. The following profiles are available: | ||
|
||
| Profile | Description | | ||
|---------|-------------| | ||
|`cpu-sampling`|Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.| | ||
|`gc-verbose`|Tracks GC collections and samples object allocations.| | ||
|`gc-collect`|Tracks GC collections only at very low overhead.| | ||
| Profile | Description | | ||
|---------|-------------| | ||
|`cpu-sampling`|Useful for tracking CPU usage and general .NET runtime information. This is the default option if no profile or providers are specified.| | ||
|`gc-verbose`|Tracks GC collections and samples object allocations.| | ||
|`gc-collect`|Tracks GC collections only at very low overhead.| | ||
|
||
- **`--providers <list-of-comma-separated-providers>`** | ||
|
||
|
@@ -249,6 +257,34 @@ | |
|
||
A string, parsed as [payload_field_name]:[payload_field_value] pairs separated by commas, that will stop the trace upon hitting an event containing all specified payload pairs. Requires `--stopping-event-provider-name` and `--stopping-event-event-name` to be set. for example, `--stopping-event-provider-name Microsoft-Windows-DotNETRuntime --stopping-event-event-name Method/JittingStarted --stopping-event-payload-filter MethodNameSpace:Program,MethodName:OnButtonClick` to stop the trace upon the first `Method/JittingStarted` event for the method `OnButtonClick` in the `Program` namespace emitted by the `Microsoft-Windows-DotNETRuntime` event provider. | ||
|
||
- **`--event-filters <list-of-comma-separated-event-filters>`** | ||
|
||
Defines an additional optional filter for each provider's events. When no `--event-filters` is specified for a provider, all events allowed by the provider's keywords and level configuration are collected. Event filters provide additional granular control beyond the keyword/level filtering. | ||
|
||
**Format:** `ProviderName:<Enable>:<EventIds>` | ||
|
||
Where: | ||
- `ProviderName`: The EventPipe provider name (e.g., `Microsoft-Windows-DotNETRuntime`) | ||
- `Enable` : Boolean value indicating whether EventIds will be enabled or disabled, defaults to false | ||
- `EventIds`: Plus-delimited event IDs to enable or disable, defaults to empty. | ||
|
||
**Examples:** | ||
``` | ||
Check failure on line 272 in docs/core/diagnostics/dotnet-trace.md
|
||
# Scenario: Disable specific events from Microsoft-Windows-DotNETRuntime | ||
--event-filters "Microsoft-Windows-DotNETRuntime:false:1+2+3+4+5+6+7+8+9" | ||
|
||
# Scenario: Enable specific events from a provider | ||
--event-filters "Microsoft-Windows-DotNETRuntime:true:80+129+130+250" | ||
# Only events 80, 129, 130, and 250 will be collected from this provider (others are filtered out) | ||
|
||
# Scenario: Multiple providers with mixed filtering - some providers have no filters | ||
--providers "Microsoft-Windows-DotNETRuntime:0xFFFFFFFF:5,System.Threading.Tasks.TplEventSource:0xFFFFFFFF:5,MyCustomProvider:0xFFFFFFFF:5" | ||
--event-filters "Microsoft-Windows-DotNETRuntime:false:1+2+3,System.Threading.Tasks.TplEventSource:true:7+8+9" | ||
# Microsoft-Windows-DotNETRuntime: All events EXCEPT 1,2,3 are collected | ||
# System.Threading.Tasks.TplEventSource: ONLY events 7,8,9 are collected | ||
# MyCustomProvider: ALL events are collected (no filter specified - follows provider keywords/level) | ||
``` | ||
|
||
> [!NOTE] | ||
|
||
> - Stopping the trace may take a long time (up to minutes) for large applications. The runtime needs to send over the type cache for all managed code that was captured in the trace. | ||
|
@@ -259,6 +295,138 @@ | |
|
||
> - When specifying a stopping event through the `--stopping-event-*` options, as the EventStream is being parsed asynchronously, there will be some events that pass through between the time a trace event matching the specified stopping event options is parsed and the EventPipeSession is stopped. | ||
|
||
## dotnet-trace collect-linux | ||
|
||
Collects diagnostic traces from .NET applications using Linux user_events as a transport layer. This command provides the same functionality as [`dotnet-trace collect`](#dotnet-trace-collect) but routes .NET runtime events through the Linux kernel's user_events subsystem before writing them to `.nettrace` files. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd recommend the initial technology we refer to should be perf_events and try to explain that:
We could mention that the .NET portion of those events are communicated using the user_events feature as a detail in the broader explanation.
We probably don't want to say this is the 'same functionality as dotnet-trace collect' because it can do more. Instead we might say it supports including the same .NET events. |
||
|
||
This transport approach enables automatic unification of user-space .NET events with kernel-space system events, since both are captured in the same kernel tracing infrastructure. Linux tools like `perf` and `ftrace` can monitor events in real-time while maintaining full compatibility with existing .NET profiling workflows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest we ignore 'perf' and 'ftrace', or only mention them as aids in describing perf_events. |
||
|
||
### Prerequisites | ||
|
||
- Linux kernel with `CONFIG_USER_EVENTS=y` support (kernel 6.4+) | ||
- Appropriate permissions to access `/sys/kernel/tracing/user_events_data` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OneCollect will need access to other things in addition to this file. It might be easiest to say 'root permission' unless someone thinks a more precise enumeration will be helpful. |
||
- .NET 10+ | ||
|
||
### Synopsis | ||
|
||
```dotnetcli | ||
dotnet-trace collect-linux | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can prune away some of these options
|
||
[--buffersize <size>] | ||
[--clreventlevel <clreventlevel>] | ||
[--clrevents <clrevents>] | ||
[--format <Chromium|NetTrace|Speedscope>] | ||
[-h|--help] | ||
[--duration dd:hh:mm:ss] | ||
[-n, --name <name>] | ||
[--diagnostic-port] | ||
[-o|--output <trace-file-path>] | ||
[-p|--process-id <pid>] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the collect verb it is required to specify either the process-id, name, command or dsrouter. I'm guessing that requirement won't exist for collect-linux and not specifying any of them is equivalent to collecting all processes on the machine? Mentioning the options here is what I'd expect, but we'd need to describe the "collect all processes by default" behavior somewhere. |
||
[--profile <profile-name>] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll need to decide what profiles are available and which events they each collect. I expect this is mostly the same as for the 'collect' verb, but cpu-sampling probably should be different. We also might want a thread-time profile that collects context switches. If we do update these profiles, we should strongly consider also renaming the highly misleading "cpu-sampling" profile for the collect verb. Currently that profile collects thread-time information, not CPU samples. |
||
[--providers <list-of-comma-separated-providers>] | ||
[-- <command>] (for target applications running .NET 10 or later) | ||
[--show-child-io] | ||
[--resume-runtime] | ||
[--stopping-event-provider-name <stoppingEventProviderName>] | ||
[--stopping-event-event-name <stoppingEventEventName>] | ||
[--stopping-event-payload-filter <stoppingEventPayloadFilter>] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some challenges with these stopping event options so it may be simplest not to support them for now:
|
||
[--event-filters <list-of-comma-separated-event-filters>] | ||
[--tracepoint-configs <list-of-comma-separated-tracepoint-configs>] | ||
[--kernel-events <list-of-kernel-events>] | ||
``` | ||
|
||
### Options | ||
|
||
`dotnet-trace collect-linux` supports all the same options as [`dotnet-trace collect`](#dotnet-trace-collect), excluding `--dsrouter`, and additionally offers: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we wind up having both a set of omissions and a set of additions I'm guessing it will be easier to describe the options explicitly rather than as a relative reference to the collect verb. I think copy-and-paste for options that are the same is completely fine. |
||
|
||
- **`--tracepoint-configs <list-of-comma-separated-tracepoint-configs>` (required)** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we remove this tracepoint-config option? |
||
|
||
Defines the explicit mapping between EventPipe providers and kernel tracepoints. Each provider in `--providers` must have a corresponding entry in `--tracepoint-configs` | ||
|
||
**Format:** `ProviderName:<DefaultTracepointName>:<TracepointSets>` | ||
|
||
Where: | ||
- `ProviderName`: The EventPipe provider name (e.g., `Microsoft-Windows-DotNETRuntime`) | ||
- `DefaultTracepointName`: Default tracepoint name for this provider (can be empty to require explicit assignment) | ||
- `TracepointSets`: Semi-colon delimited `TracepointName=<EventIds>` | ||
- `EventIds`: Plus-delimited event IDs to route to that tracepoint | ||
|
||
> [!NOTE] | ||
> All tracepoint names are automatically prefixed with the provider name to avoid collisions. For example, `gc_events` for the `Microsoft-Windows-DotNETRuntime` provider becomes `Microsoft_Windows_DotNETRuntime_gc_events`. | ||
|
||
> [!TIP] | ||
> Use `--event-filters` to disable specific events before they are routed to tracepoints. Event filtering happens before tracepoint routing - only events that pass the filter will be sent to their assigned tracepoints. | ||
|
||
**Examples:** | ||
``` | ||
Check failure on line 360 in docs/core/diagnostics/dotnet-trace.md
|
||
# Scenario: All events from provider go to a default tracepoint | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime" | ||
# All enabled events from Microsoft-Windows-DotNETRuntime will be written to Microsoft_Windows_DotNETRuntime_dotnet_runtime | ||
|
||
# Scenario: Split events by categories | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::gc_events=1+2+3;jit_events=10+11+12" | ||
# EventIDs 1, 2, and 3 will be written to Microsoft_Windows_DotNETRuntime_gc_events | ||
# EventIDs 10, 11, and 12 will be written to Microsoft_Windows_DotNETRuntime_jit_events | ||
|
||
# Multiple providers (comma-separated) | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::gc_events=1+2+3,MyCustomProvider:custom_events" | ||
# EventIds 1, 2, and 3 from Microsoft-Windows-DotNETRuntime will be written to Microsoft_Windows_DotNETRuntime_gc_events | ||
# All enabled events from MyCustomProvider will be written to MyCustomProvider_custom_events | ||
``` | ||
|
||
- **`--kernel-events <list-of-kernel-events>` (optional)** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should work with Beau to figure out what kind of event names can be supported. For example https://man7.org/linux/man-pages/man1/perf-record.1.html shows a whole bunch of different things can be specified as an event and I have no idea what part of that OneCollect handles. I'm hoping we can at least support anything in /sys/kernel/tracing/available_events as well as the symbolic PMU events like cpu-cycles. |
||
|
||
A comma-separated list of kernel event categories to include in the trace. These events are automatically grouped into kernel-named tracepoints. Available categories include: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are these things being enumerated categories, or individual events? They sound like individual events. |
||
|
||
| Category | Description | Linux Tracepoints | | ||
|----------|-------------|-------------------| | ||
| `syscalls` | System call entry/exit events | `syscalls:sys_enter_*`, `syscalls:sys_exit_*` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we expect dotnet-trace to support a user who literally types?
Or are we expecting the user to substitute the * with the name of a particular call? perf supports wildcards in the input and it would be cool if we could too, but if that makes life too hard for OneCollect it doesn't feel essential. |
||
| `sched` | Process scheduling events | `sched:sched_switch`, `sched:sched_wakeup` | | ||
| `net` | Network-related events | `net:netif_rx`, `net:net_dev_xmit` | | ||
| `fs` | Filesystem I/O events | `ext4:*`, `vfs:*` | | ||
| `mm` | Memory management events | `kmem:*`, `vmscan:*` | | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If someone specified "sched:sched_wakeup,sched:sched_switch" what provider name, event name, and field lists would we expect to show up in the nettrace file? (This level of detail may not go into the docs, but we should understand it ourselves to decide what info should go in the docs) |
||
These events correspond to Linux kernel tracepoints documented in the [Linux kernel tracing documentation](https://www.kernel.org/doc/html/latest/trace/index.html). For more details on available tracepoints, see [ftrace](https://www.kernel.org/doc/html/latest/trace/ftrace.html) and [tracepoints](https://www.kernel.org/doc/html/latest/trace/tracepoints.html). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the ftrace and tracepoints link it wasn't clear there was anything to learn about which events were available. It seemed like it was conceptual information rather than listings of specific events. I think we could omit those links. |
||
|
||
Example: `--kernel-events syscalls,sched,net` | ||
|
||
### Linux Integration | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd recommend we drop this part and focus on the user scenario where they specify events and then get those events recorded in a trace file on disk. |
||
|
||
**Tracepoint Configuration Requirements:** | ||
|
||
- **Mandatory Mapping**: Every provider must be explicitly mapped to at least a default tracepoint and/or exclusive tracepoint sets via `--tracepoint-configs` | ||
- **Tracepoint Isolation**: Each tracepoint can only receive events from one provider | ||
- **Event Routing**: Different event IDs within a provider can be routed to different tracepoints for granular control | ||
- **Automatic Prefixing**: All tracepoint names are prefixed with the provider name to avoid collisions | ||
|
||
**Kernel Integration Points:** | ||
|
||
The kernel tracepoints can be accessed through standard Linux tracing interfaces: | ||
|
||
- **ftrace**: `/sys/kernel/tracing/events/user_events/` | ||
- **perf**: Use `perf list user_events*` to see available events | ||
- **System monitoring tools**: Any tool that can consume Linux tracepoints | ||
|
||
### Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lower in this doc are some examples of using dotnet-trace collect in various situations. No need just yet, but we'd probably want to update or add to those examples for the collect-linux functionality once we've clarified what it will be. |
||
|
||
```dotnetcli | ||
# All runtime events to one tracepoint | ||
dotnet-trace collect-linux --process-id 1234 \ | ||
--providers Microsoft-Windows-DotNETRuntime:0x8000:5 \ | ||
--kernel-events syscalls,sched \ | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime" | ||
|
||
# Split runtime events by category | ||
dotnet-trace collect-linux --process-id 1234 \ | ||
--providers Microsoft-Windows-DotNETRuntime:0x8001:5 \ | ||
--kernel-events syscalls,sched,net,fs \ | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime::exception_events=80;gc_events=1+2" | ||
|
||
# Multiple providers | ||
dotnet-trace collect-linux --process-id 1234 \ | ||
--providers "Microsoft-Windows-DotNETRuntime:0x8001:5,MyCustomProvider:0xFFFFFFFF:5" \ | ||
--tracepoint-configs "Microsoft-Windows-DotNETRuntime:dotnet_runtime,MyCustomProvider:custom_events" | ||
``` | ||
|
||
## dotnet-trace convert | ||
|
||
Converts `nettrace` traces to alternate formats for use with alternate trace analysis tools. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -214,7 +214,7 @@ into other formats, such as Chromium or [Speedscope](https://www.speedscope.app/ | |
Trace completed. | ||
``` | ||
|
||
dotnet-trace uses the [conventional text format](#conventions-for-describing-provider-configuration) for describing provider configuration in | ||
dotnet-trace uses a comma-delimited variant of the [conventional text format](#conventions-for-describing-provider-configuration) for describing provider configuration in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be easier to restructure this so that the part of the doc describing the "conventional text format" just describes a single provider, not the separators between them in a list. Then you could say: dotnet-trace uses a comma-separated list of providers in the --providers Also this sounds like it could be a separate doc PR? It doesn't appear connected to the new user_events work. |
||
the `--providers` argument. For more options on how to take traces using dotnet-trace, see the | ||
[dotnet-trace docs](./dotnet-trace.md#collect-a-trace-with-dotnet-trace). | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to tweak the wording a bit later, but not focusing on this for the moment :)