Most of the crates support the "tracing" feature, which enables the tracing crate.
When the feature is enabled, some critical path code is instrumented with tracing spans:
#[cfg_attr(
feature = "tracing",
tracing::instrument(level = "trace", skip(self, size))
)]
fn alloc(&mut self, size: u64) -> Result<StorageHandle, IoError> {
...
}The test-log crate is threaded through many tests as a replacement for #[test].
This crate provides test-time instrumentation to setup log and trace output.
Actually getting this (extremely verbose) output requires a bit of setup:
- the
tracingfeature must be enabled. - the
test-logoutput initialization must be enabled via thetest_log/defaultfeature. - a sufficiently broad
RUST_LOGenvironment variable must be set (this is an entire config language). - the
--nocaptureflag must be passed tocargo test.
An example:
$ RUST_LOG=trace cargo test -p cubecl-cuda \
--features tracing,test-log/default \
tests::identity::f16_ty::test_large -- --nocapture
Finished `test` profile [unoptimized] target(s) in 17.84s
Running unittests src/lib.rs (target/debug/deps/cubecl_cuda-42be3205312238f1)
running 1 test
2026-01-08T23:56:09.960606Z INFO cubecl_cuda::compute::server: Peer data transfer not available for device 0
2026-01-08T23:56:09.977977Z TRACE launch_inner{count=(4, 64, 1) mode=Unchecked stream_id=StreamId { value: 0 } kernel.name=cubecl_std::tensor::identity::identity_kernel::IdentityKernel<cubecl_cuda::runtime::CudaRuntime> kernel.id=(
CubeDim {
x: 16,
y: 16,
z: 1,
},
Scalar (
Float (
F16,
),
),
Scalar,
TensorCompilationArg {
inplace: None,
line_size: 16,
},
)}: cubecl_cuda::compute::context: Compiling kernel
test tests::identity::f16_ty::test_large ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 641 filtered out; finished in 0.48s