Skip to content

Commit 5cc27d4

Browse files
committed
Implement profiling for compiler-generated move/copy operations
As part of Rust's move semantics, the compiler will generate memory copy operations to move objects about. These are generally pretty small, and the backend is good at optimizing them. But sometimes, if the type is large, they can end up being surprisingly expensive. In such cases, you might want to pass them by reference, or Box them up. However, these moves are also invisible to profiling. At best they appear as a `memcpy`, but one memcpy is basically indistinguishable from another, and its very hard to know that 1) it's actually a compiler-generated copy, and 2) what type it pertains to. This PR adds two new pseudo-intrinsic functions in `core::intrinsics`: ``` pub fn compiler_move<T, const SIZE: usize>(_src: *const T, _dst: *mut T); pub fn compiler_copy<T, const SIZE: usize>(_src: *const T, _dst: *mut T); ``` These functions are never actually called however. A MIR transform pass -- `instrument_moves.rs` -- will locate all `Operand::Move`/`Copy` operations, and modify their source location to make them appear as if they had been inlined from `compiler_move`/`_copy`. These functions have two generic parameters: the type being copied, and its size in bytes. This should make it very easy to identify which types are being expensive in your program (both in aggregate, and at specific hotspots). The size isn't strictly necessary since you can derive it from the type, but it's small and it makes it easier to understand what you're looking at. This functionality is only enabled if you have debug info generation enabled, and also set the `-Zinstrument-moves` option. It does not instrument all moves. By default it will only annotate ones for types over 64 bytes. The `-Zinstrument-moves-size-limit` specifies the size in bytes to start instrumenting for. This has minimal impact on the size of debugging info. For rustc itself, the overall increase in librustc_driver*.so size is around .05% for 65 byte limit, 0.004% for 1025 byte limit, and a worst case of 0.6% for an 8 byte limit. There's no effect on generated code, it only adds debug info. As an example of a backtrace: ``` Breakpoint 1.3, __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:255 255 ENTRY_P2ALIGN (MEMMOVE_SYMBOL (__memmove, unaligned_erms), 6) (gdb) bt # 0 __memcpy_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:255 # 1 0x0000555555590e7e in core::intrinsics::compiler_copy<[u64; 1000], 8000> () at library/core/src/intrinsics/mod.rs:10 # 2 t::main () at t.rs:10 ```
1 parent 4082d6a commit 5cc27d4

File tree

15 files changed

+631
-0
lines changed

15 files changed

+631
-0
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
//! Instrumentation pass for move/copy operations.
2+
//!
3+
//! This pass modifies the source scopes of statements containing `Operand::Move` and `Operand::Copy`
4+
//! to make them appear as if they were inlined from `compiler_move()` and `compiler_copy()` intrinsic
5+
//! functions. This creates the illusion that moves/copies are function calls in debuggers and
6+
//! profilers, making them visible for performance analysis.
7+
//!
8+
//! The pass leverages the existing inlining infrastructure by creating synthetic `SourceScopeData`
9+
//! with the `inlined` field set to point to the appropriate intrinsic function.
10+
11+
use rustc_index::IndexVec;
12+
use rustc_middle::mir::*;
13+
use rustc_middle::ty::{self, Instance, Ty, TyCtxt, TypingEnv};
14+
use rustc_session::config::DebugInfo;
15+
use rustc_span::sym;
16+
17+
/// Default minimum size in bytes for move/copy operations to be instrumented. Set to 64+1 bytes
18+
/// (typical cache line size) to focus on potentially expensive operations.
19+
const DEFAULT_INSTRUMENT_MOVES_SIZE_LIMIT: u64 = 65;
20+
21+
#[derive(Copy, Clone, Debug)]
22+
enum Operation {
23+
Move,
24+
Copy,
25+
}
26+
27+
/// Bundle up parameters into a structure to make repeated calling neater
28+
struct Params<'a, 'tcx> {
29+
tcx: TyCtxt<'tcx>,
30+
source_scopes: &'a mut IndexVec<SourceScope, SourceScopeData<'tcx>>,
31+
local_decls: &'a IndexVec<Local, LocalDecl<'tcx>>,
32+
typing_env: TypingEnv<'tcx>,
33+
size_limit: u64,
34+
}
35+
36+
/// MIR transform that instruments move/copy operations for profiler visibility.
37+
pub(crate) struct InstrumentMoves;
38+
39+
impl<'tcx> crate::MirPass<'tcx> for InstrumentMoves {
40+
fn is_enabled(&self, sess: &rustc_session::Session) -> bool {
41+
sess.opts.unstable_opts.instrument_moves && sess.opts.debuginfo != DebugInfo::None
42+
}
43+
44+
fn run_pass(&self, tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) {
45+
// Skip promoted MIR bodies to avoid recursion
46+
if body.source.promoted.is_some() {
47+
return;
48+
}
49+
50+
let typing_env = body.typing_env(tcx);
51+
let size_limit = tcx
52+
.sess
53+
.opts
54+
.unstable_opts
55+
.instrument_moves_size_limit
56+
.unwrap_or(DEFAULT_INSTRUMENT_MOVES_SIZE_LIMIT);
57+
58+
// Common params, including selectively borrowing the bits of Body we need to avoid
59+
// mut/non-mut aliasing problems.
60+
let mut params = Params {
61+
tcx,
62+
source_scopes: &mut body.source_scopes,
63+
local_decls: &body.local_decls,
64+
typing_env,
65+
size_limit,
66+
};
67+
68+
// Process each basic block
69+
for block_data in body.basic_blocks.as_mut() {
70+
for stmt in &mut block_data.statements {
71+
let source_info = &mut stmt.source_info;
72+
if let StatementKind::Assign(box (_, rvalue)) = &stmt.kind {
73+
match rvalue {
74+
Rvalue::Use(op)
75+
| Rvalue::Repeat(op, _)
76+
| Rvalue::Cast(_, op, _)
77+
| Rvalue::UnaryOp(_, op) => {
78+
self.annotate_move(&mut params, source_info, op);
79+
}
80+
Rvalue::BinaryOp(_, box (lop, rop)) => {
81+
self.annotate_move(&mut params, source_info, lop);
82+
self.annotate_move(&mut params, source_info, rop);
83+
}
84+
Rvalue::Aggregate(_, ops) => {
85+
for op in ops {
86+
self.annotate_move(&mut params, source_info, op);
87+
}
88+
}
89+
Rvalue::Ref(..)
90+
| Rvalue::ThreadLocalRef(..)
91+
| Rvalue::RawPtr(..)
92+
| Rvalue::NullaryOp(..)
93+
| Rvalue::Discriminant(..)
94+
| Rvalue::CopyForDeref(..)
95+
| Rvalue::ShallowInitBox(..)
96+
| Rvalue::WrapUnsafeBinder(..) => {} // No operands to instrument
97+
}
98+
}
99+
}
100+
}
101+
}
102+
103+
fn is_required(&self) -> bool {
104+
false // Optional optimization/instrumentation pass
105+
}
106+
}
107+
108+
impl InstrumentMoves {
109+
/// If this is a Move or Copy of a concrete type, update its debug info to make it look like it
110+
/// was inlined from `core::intrinsics::compiler_move`/`compiler_copy`.
111+
fn annotate_move<'tcx>(
112+
&self,
113+
params: &mut Params<'_, 'tcx>,
114+
source_info: &mut SourceInfo,
115+
op: &Operand<'tcx>,
116+
) {
117+
let (place, operation) = match op {
118+
Operand::Move(place) => (place, Operation::Move),
119+
Operand::Copy(place) => (place, Operation::Copy),
120+
_ => return,
121+
};
122+
let Params { tcx, typing_env, local_decls, size_limit, source_scopes } = params;
123+
124+
if let Some(type_size) =
125+
self.should_instrument_operation(*tcx, *typing_env, local_decls, place, *size_limit)
126+
{
127+
let ty = place.ty(*local_decls, *tcx).ty;
128+
source_info.scope = self.create_inlined_scope(
129+
*tcx,
130+
*typing_env,
131+
source_scopes,
132+
source_info,
133+
operation,
134+
ty,
135+
type_size,
136+
);
137+
}
138+
}
139+
140+
/// Determines if an operation should be instrumented based on type characteristics.
141+
/// Returns Some(size) if it should be instrumented, None otherwise.
142+
fn should_instrument_operation<'tcx>(
143+
&self,
144+
tcx: TyCtxt<'tcx>,
145+
typing_env: ty::TypingEnv<'tcx>,
146+
local_decls: &rustc_index::IndexVec<Local, LocalDecl<'tcx>>,
147+
place: &Place<'tcx>,
148+
size_limit: u64,
149+
) -> Option<u64> {
150+
let ty = place.ty(local_decls, tcx).ty;
151+
let Ok(layout) = tcx.layout_of(typing_env.as_query_input(ty)) else {
152+
return None;
153+
};
154+
155+
let size = layout.size.bytes();
156+
157+
// 1. Skip ZST types (no actual move/copy happens)
158+
if layout.is_zst() {
159+
return None;
160+
}
161+
162+
// 2. Check size threshold (only instrument large moves/copies)
163+
if size < size_limit {
164+
return None;
165+
}
166+
167+
// 3. Skip scalar/vector types that won't generate memcpy
168+
match layout.layout.backend_repr {
169+
rustc_abi::BackendRepr::Scalar(_)
170+
| rustc_abi::BackendRepr::ScalarPair(_, _)
171+
| rustc_abi::BackendRepr::SimdVector { .. } => None,
172+
_ => Some(size),
173+
}
174+
}
175+
176+
/// Creates an inlined scope that makes operations appear to come from
177+
/// the specified compiler intrinsic function.
178+
fn create_inlined_scope<'tcx>(
179+
&self,
180+
tcx: TyCtxt<'tcx>,
181+
typing_env: TypingEnv<'tcx>,
182+
source_scopes: &mut IndexVec<SourceScope, SourceScopeData<'tcx>>,
183+
original_source_info: &SourceInfo,
184+
operation: Operation,
185+
ty: Ty<'tcx>,
186+
type_size: u64,
187+
) -> SourceScope {
188+
let intrinsic_def_id = match operation {
189+
Operation::Move => tcx.get_diagnostic_item(sym::compiler_move),
190+
Operation::Copy => tcx.get_diagnostic_item(sym::compiler_copy),
191+
};
192+
193+
let Some(intrinsic_def_id) = intrinsic_def_id else {
194+
// Shouldn't happen, but just return original scope if it does
195+
return original_source_info.scope;
196+
};
197+
198+
// Monomorphize the intrinsic for the actual type being moved/copied + size const parameter
199+
// compiler_move<T, const SIZE: usize> or compiler_copy<T, const SIZE: usize>
200+
let size_const = ty::Const::from_target_usize(tcx, type_size);
201+
let generic_args = tcx.mk_args(&[ty.into(), size_const.into()]);
202+
let intrinsic_instance = Instance::expect_resolve(
203+
tcx,
204+
typing_env,
205+
intrinsic_def_id,
206+
generic_args,
207+
original_source_info.span,
208+
);
209+
210+
// Create new inlined scope that makes the operation appear to come from the intrinsic
211+
let inlined_scope_data = SourceScopeData {
212+
span: original_source_info.span,
213+
parent_scope: Some(original_source_info.scope),
214+
215+
// Pretend this op is inlined from the intrinsic
216+
inlined: Some((intrinsic_instance, original_source_info.span)),
217+
218+
// Proper inlined scope chaining to maintain debug info hierarchy
219+
inlined_parent_scope: {
220+
let parent_scope = &source_scopes[original_source_info.scope];
221+
if parent_scope.inlined.is_some() {
222+
// If parent is already inlined, chain through it
223+
Some(original_source_info.scope)
224+
} else {
225+
// Otherwise, use the parent's inlined_parent_scope
226+
parent_scope.inlined_parent_scope
227+
}
228+
},
229+
230+
local_data: ClearCrossCrate::Clear,
231+
};
232+
233+
// Add the new scope
234+
source_scopes.push(inlined_scope_data)
235+
}
236+
}

compiler/rustc_mir_transform/src/lib.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ declare_passes! {
147147
// by custom rustc drivers, running all the steps by themselves. See #114628.
148148
pub mod inline : Inline, ForceInline;
149149
mod impossible_predicates : ImpossiblePredicates;
150+
mod instrument_moves : InstrumentMoves;
150151
mod instsimplify : InstSimplify { BeforeInline, AfterSimplifyCfg };
151152
mod jump_threading : JumpThreading;
152153
mod known_panics_lint : KnownPanicsLint;
@@ -730,6 +731,9 @@ pub(crate) fn run_optimization_passes<'tcx>(tcx: TyCtxt<'tcx>, body: &mut Body<'
730731
// Cleanup for human readability, off by default.
731732
&prettify::ReorderBasicBlocks,
732733
&prettify::ReorderLocals,
734+
// Instrument move/copy operations for profiler visibility.
735+
// Late so we're instrumenting any Move/Copy that survived all the previous passes.
736+
&instrument_moves::InstrumentMoves,
733737
// Dump the end result for testing and debugging purposes.
734738
&dump_mir::Marker("PreCodegen"),
735739
],

compiler/rustc_session/src/options.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2384,6 +2384,12 @@ options! {
23842384
"print some statistics about AST and HIR (default: no)"),
23852385
instrument_mcount: bool = (false, parse_bool, [TRACKED],
23862386
"insert function instrument code for mcount-based tracing (default: no)"),
2387+
instrument_moves: bool = (false, parse_bool, [TRACKED],
2388+
"emit debug info for compiler-generated move and copy operations \
2389+
to make them visible in profilers (default: no)"),
2390+
instrument_moves_size_limit: Option<u64> = (None, parse_opt_number, [TRACKED],
2391+
"the minimum size object to instrument move/copy operations \
2392+
(default: 65 bytes)"),
23872393
instrument_xray: Option<InstrumentXRay> = (None, parse_instrument_xray, [TRACKED],
23882394
"insert function instrument code for XRay-based tracing (default: no)
23892395
Optional extra settings:

compiler/rustc_span/src/symbol.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -691,7 +691,9 @@ symbols! {
691691
compile_error,
692692
compiler,
693693
compiler_builtins,
694+
compiler_copy,
694695
compiler_fence,
696+
compiler_move,
695697
concat,
696698
concat_bytes,
697699
concat_idents,

library/core/src/intrinsics/mod.rs

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3314,3 +3314,27 @@ pub unsafe fn va_arg<T: VaArgSafe>(ap: &mut VaListImpl<'_>) -> T;
33143314
#[rustc_intrinsic]
33153315
#[rustc_nounwind]
33163316
pub unsafe fn va_end(ap: &mut VaListImpl<'_>);
3317+
3318+
/// Compiler-generated move operation - never actually called.
3319+
/// Used solely for profiling and debugging visibility.
3320+
///
3321+
/// This function serves as a symbolic marker that appears in stack traces
3322+
/// when rustc generates move operations, making them visible in profilers.
3323+
/// The SIZE parameter encodes the size of the type being moved in the function name.
3324+
#[rustc_force_inline]
3325+
#[rustc_diagnostic_item = "compiler_move"]
3326+
pub fn compiler_move<T, const SIZE: usize>(_src: *const T, _dst: *mut T) {
3327+
unreachable!("compiler_move should never be called - it's only for debug info")
3328+
}
3329+
3330+
/// Compiler-generated copy operation - never actually called.
3331+
/// Used solely for profiling and debugging visibility.
3332+
///
3333+
/// This function serves as a symbolic marker that appears in stack traces
3334+
/// when rustc generates copy operations, making them visible in profilers.
3335+
/// The SIZE parameter encodes the size of the type being copied in the function name.
3336+
#[rustc_force_inline]
3337+
#[rustc_diagnostic_item = "compiler_copy"]
3338+
pub fn compiler_copy<T, const SIZE: usize>(_src: *const T, _dst: *mut T) {
3339+
unreachable!("compiler_copy should never be called - it's only for debug info")
3340+
}
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# `instrument-moves`
2+
3+
The `-Z instrument-moves` flag enables instrumentation of compiler-generated
4+
move and copy operations, making them visible in profilers and stack traces
5+
for performance debugging.
6+
7+
When enabled, the compiler will inject calls to `core::intrinsics::compiler_move`
8+
and `core::intrinsics::compiler_copy` functions around large move and copy operations.
9+
These functions are never actually executed (they contain `unreachable!()`), but
10+
their presence in debug info makes expensive memory operations visible in profilers.
11+
12+
## Syntax
13+
14+
```bash
15+
rustc -Z instrument-moves[=<boolean>]
16+
rustc -Z instrument-moves-size-limit=<bytes>
17+
```
18+
19+
## Options
20+
21+
- `-Z instrument-moves`: Enable/disable move/copy instrumentation (default: `false`)
22+
- `-Z instrument-moves-size-limit=N`: Only instrument operations on types >= N bytes (default: 65 bytes)
23+
24+
## Examples
25+
26+
```bash
27+
# Enable instrumentation with default threshold (pointer size)
28+
rustc -Z instrument-moves main.rs
29+
30+
# Enable with custom 128-byte threshold
31+
rustc -Z instrument-moves -Z instrument-moves-size-limit=128 main.rs
32+
33+
# Only instrument very large moves (1KB+)
34+
rustc -Z instrument-moves -Z instrument-moves-size-limit=1024 main.rs
35+
```
36+
37+
## Behavior
38+
39+
The instrumentation only applies to:
40+
- Types larger than the specified size threshold
41+
- Non-immediate types (those that would generate `memcpy`)
42+
- Operations that actually move/copy data (not ZST types)
43+
44+
Stack traces will show the operations:
45+
```
46+
0: memcpy
47+
1: core::intrinsics::compiler_move::<MyLargeStruct, 148>
48+
2: my_function
49+
```
50+
51+
## Example
52+
53+
```rust
54+
#[derive(Clone)]
55+
struct LargeData {
56+
buffer: [u8; 1000],
57+
}
58+
59+
fn example() {
60+
let data = LargeData { buffer: [0; 1000] };
61+
let copy = data.clone(); // Shows as compiler_copy in profiler
62+
let moved = data; // Shows as compiler_move in profiler
63+
}
64+
```
65+
66+
## Overhead
67+
68+
This has no effect on generated code; it only adds debuginfo. The overhead is
69+
typically very small; on rustc itself, the default limit of 65 bytes adds about
70+
0.055% to the binary size.

0 commit comments

Comments
 (0)