Skip to content

Commit 4b29b79

Browse files
authored
Merge pull request #3287 from rbtcollins/tracing
Add in opentelemetry tracing as a feature
2 parents 8662c33 + d9e684c commit 4b29b79

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1311
-200
lines changed

CONTRIBUTING.md

Lines changed: 78 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,11 @@ task, but not currently run per-platform, which means there is no way to find
9696
out the status of clippy per platform without running it on that platform as a
9797
developer.
9898

99+
### import rustup-macros::{integration,unit}_test into test modules
100+
101+
These test helpers add pre-and-post logic to tests to enable the use of tracing
102+
inside tests, which can be helpful for tracking down behaviours in larger tests.
103+
99104
## Version numbers
100105

101106
If you ever see a released version of rustup which has `::` in its version string
@@ -217,8 +222,8 @@ break our testing (like `RUSTUP_TOOLCHAIN`, `SHELL`, `ZDOTDIR`, `RUST_BACKTRACE`
217222
But if you want to debug locally, you may need backtrace. `RUSTUP_BACKTRACE`
218223
is used like `RUST_BACKTRACE` to enable backtraces of failed tests.
219224

220-
**NOTE**: This is a backtrace for the test, not for any rustup process running
221-
in the test
225+
**NOTE**: This is a backtrace for the test, not for any subprocess invocation of
226+
rustup process running in the test
222227

223228
```bash
224229
$ RUSTUP_BACKTRACE=1 cargo test --release --test cli-v1 -- remove_toolchain_then_add_again
@@ -273,3 +278,74 @@ test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 26 filtered out
273278

274279
error: test failed, to rerun pass '--test cli-v1'
275280
```
281+
282+
## Tracing
283+
284+
The feature "otel" can be used when building rustup to turn on Opentelemetry
285+
tracing with an OLTP GRPC exporter. This requires protoc installed, which can be
286+
downloaded from GitHub or installed via package manager.
287+
288+
The normal [OTLP environment
289+
variables](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md)
290+
can be used to customise its behaviour, but often the simplest thing is to just
291+
run a Jaeger docker container on the same host:
292+
293+
```sh
294+
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:latest
295+
```
296+
297+
Then build rustup-init with tracing:
298+
299+
```sh
300+
cargo build --features=otel
301+
```
302+
303+
Run the operation you want to analyze:
304+
305+
```sh
306+
RUSTUP_FORCE_ARG0="rustup" ./target/debug/rustup-init show
307+
```
308+
309+
And [look in Jaeger for a trace](http://localhost:16686/search?service=rustup).
310+
311+
### Tracing and tests
312+
313+
The custom macro `rustup_macros::test` adds a prelude and suffix to each test to
314+
ensure that there is a tracing context setup, that the test function is a span,
315+
and that the spans from the test are flushed. Build with features=otel to
316+
use this feature.
317+
318+
### Adding instrumentation
319+
320+
The `otel` feature uses conditional compilation to only add function instrument
321+
when enabled. Instrumenting a currently uninstrumented function is mostly simply
322+
done like so:
323+
324+
```rust
325+
#[cfg_attr(feature = "otel", tracing::instrument(err, skip_all))]
326+
```
327+
328+
`skip_all` is not required, but some core structs don't implement Debug yet, and
329+
others have a lot of output in Debug : tracing adds some overheads, so keeping
330+
spans lightweight can help avoid frequency bias in the results - where
331+
parameters with large debug in frequently called functions show up as much
332+
slower than they are.
333+
334+
Some good general heuristics:
335+
336+
- Do instrument slow blocking functions
337+
- Do instrument functions with many callers or that call many different things,
338+
as these tend to help figure the puzzle of what-is-happening
339+
- Default to not instrumenting thin shim functions (or at least, only instrument
340+
them temporarily while figuring out the shape of a problem)
341+
- Be way of debug build timing - release optimisations make a huge difference,
342+
though debug is a lot faster to iterate on. If something isn't a problem in
343+
release don't pay it too much heed in debug.
344+
345+
### Caveats
346+
347+
Cross-thread propogation isn't connected yet. This will cause instrumentation in
348+
a thread to make a new root span until it is fixed. If any Tokio runtime-related
349+
code gets added in those threads this will also cause a panic. We have a couple
350+
of threadpools in use today; if you need to instrument within that context, use
351+
a thunk to propogate the tokio runtime into those threads.

0 commit comments

Comments
 (0)