Skip to content

Commit 5029ac5

Browse files
authored
Added tracing page to the candle book. (#2922)
* tracing page * warned about asynchronous execution * cleanup * added Nsignt Systems recommendation
1 parent de23d34 commit 5029ac5

File tree

2 files changed

+69
-0
lines changed

2 files changed

+69
-0
lines changed

candle-book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
- [Running a model](inference/inference.md)
1717
- [Using the hub](inference/hub.md)
1818
- [Error management](error_manage.md)
19+
- [Tracing](tracing.md)
1920
- [Training](training/training.md)
2021
- [Simplified](training/simplified.md)
2122
- [MNIST](training/mnist.md)

candle-book/src/tracing.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Tracing
2+
3+
Tracing is a powerful tool for identifying performance issues and bottlenecks in code.
4+
5+
> Profiling on GPUs is trickier due to asynchronous execution, see the [GPU section](#gpu).
6+
7+
## Overview
8+
9+
Candle uses the [tracing](https://docs.rs/tracing/latest/tracing/) crate for instrumentation.
10+
11+
To try it out, run an example in `candle-examples` with the `--tracing` flag.
12+
This generates a trace file, typically named `trace-<timestamp>.json`.
13+
You can view the trace in Chrome by navigating to `chrome://tracing/`, clicking **Load**, and selecting the generated trace file.
14+
15+
## Adding Tracing
16+
17+
Candle includes built-in tracing for many internal operations, using [spans](https://docs.rs/tracing/latest/tracing/struct.Span.html) to mark key points of execution.
18+
19+
To add custom tracing in your code, you can define a span like this:
20+
21+
```rust
22+
let span = tracing::span!(tracing::Level::TRACE, name);
23+
```
24+
25+
Then, to record the span during execution, create a guard:
26+
27+
```rust
28+
let _enter = span.enter();
29+
```
30+
31+
This guard will record the span's duration, from when it is created to when it is dropped, into a global data structure managed by the tracing crate.
32+
33+
## Recording and Saving a Trace
34+
35+
To capture and save trace data, you need to configure the tracing system with an output format. Candle uses the [tracing_subscriber](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/) and [tracing_chrome](https://docs.rs/tracing-chrome/latest/tracing_chrome/) crates.
36+
37+
The snippet below sets up a Chrome compatible recorder that logs all tracing activity between creation and drop of the guard:
38+
39+
```rust
40+
use tracing_chrome::ChromeLayerBuilder;
41+
use tracing_subscriber::prelude::*;
42+
43+
let _guard = {
44+
let (chrome_layer, guard) = ChromeLayerBuilder::new().build();
45+
tracing_subscriber::registry().with(chrome_layer).init();
46+
guard
47+
};
48+
```
49+
50+
## GPU
51+
52+
When using CUDA, Metal, or other asynchronous GPU backends, tracing may produce misleading timing data because operations are queued rather than executed immediately.
53+
54+
### CUDA
55+
56+
For CUDA-specific profiling, you have two options:
57+
58+
1. Set the environment variable `CUDA_LAUNCH_BLOCKING=1` which forces synchronous execution. This makes trace timings more accurate, at the cost of reduced performance.
59+
2. Use [NVIDIA's Nsight Systems](https://developer.nvidia.com/nsight-systems) (`nsys profile` and `nsys-ui`) which are designed specifically for profiling asynchronous CUDA executions.
60+
61+
We recommend using NVIDIA's Nsight Systems when possible, as it offers accurate performance data without altering typical execution patterns. In contrast, setting the `CUDA_LAUNCH_BLOCKING` environment variable forces synchronous execution, which can significantly alter execution behavior.
62+
63+
#### Performance Profiling with NVIDIA Nsight Systems
64+
65+
1. Generate an `.nsys-rep` file containing performance data ([docs](https://docs.nvidia.com/nsight-systems/UserGuide/index.html#example-single-command-lines))
66+
- Run `nsys profile --trace cuda,nvtx,osrt --gpu-metrics-device=all --output profile_run ./target/debug/... --prompt "whatever "`
67+
1. Open the generated `.nsys-rep` report file in Nsight Systems GUI
68+
- File > Open

0 commit comments

Comments
 (0)