Skip to content

Commit 6c2a0b2

Browse files
committed
design-doc/design-001.md: Changes after review
design-docs/design-001.md: Fix design of MonitoringEvents The values of the events should come from sys.monitoring and not be set in the Rust code, because we risk inconsistencies this way Signed-off-by: Tzanko Matev <[email protected]> design-docs/design-001.md: Fix event callback types and get rid of PyFrameObject PyFrameObject is not part of sys.monitroing API Signed-off-by: Tzanko Matev <[email protected]> design-docs/design-001.md: CodeObject access Since PyO3 doesn't provide access to PyCodeObject internals we need an alternative way to access them Signed-off-by: Tzanko Matev <[email protected]> design-docs/design-001.md: We'll ignore BRANCH events initially Signed-off-by: Tzanko Matev <[email protected]>
1 parent 74798c1 commit 6c2a0b2

File tree

1 file changed

+126
-41
lines changed

1 file changed

+126
-41
lines changed

design-docs/design-001.md

Lines changed: 126 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,42 @@ The tracer collects `sys.monitoring` events, converts them to `runtime_tracing`
1717
```
1818
- Register one callback per event using `sys.monitoring.register_callback`.
1919
```rs
20-
pub enum MonitoringEvent { PyStart, PyResume, PyReturn, PyYield, StopIteration, PyUnwind, PyThrow, Reraise, Call, Line, Instruction, Jump, Branch, Raise, ExceptionHandled, CReturn, CRaise }
21-
pub type CallbackFn = unsafe extern "C" fn(event: MonitoringEvent, frame: *mut PyFrameObject);
22-
pub fn register_callback(tool: &ToolId, event: MonitoringEvent, cb: CallbackFn);
20+
#[repr(transparent)]
21+
pub struct EventId(pub u64); // Exact value loaded from sys.monitoring.events.*
22+
23+
pub struct MonitoringEvents {
24+
pub BRANCH: EventId,
25+
pub CALL: EventId,
26+
pub C_RAISE: EventId,
27+
pub C_RETURN: EventId,
28+
pub EXCEPTION_HANDLED: EventId,
29+
pub INSTRUCTION: EventId,
30+
pub JUMP: EventId,
31+
pub LINE: EventId,
32+
pub PY_RESUME: EventId,
33+
pub PY_RETURN: EventId,
34+
pub PY_START: EventId,
35+
pub PY_THROW: EventId,
36+
pub PY_UNWIND: EventId,
37+
pub PY_YIELD: EventId,
38+
pub RAISE: EventId,
39+
pub RERAISE: EventId,
40+
pub STOP_ITERATION: EventId,
41+
}
42+
43+
pub fn load_monitoring_events(py: Python<'_>) -> PyResult<MonitoringEvents>;
44+
45+
// Python-level callback registered via sys.monitoring.register_callback
46+
pub type CallbackFn = PyObject;
47+
pub fn register_callback(tool: &ToolId, event: &EventId, cb: &CallbackFn) -> PyResult<()>;
2348
```
2449
- Enable all desired events by bitmask with `sys.monitoring.set_events`.
2550
```rs
26-
pub const ALL_EVENTS_MASK: u64 = 0xffff;
27-
pub fn enable_events(tool: &ToolId, mask: u64);
51+
#[derive(Clone, Copy)]
52+
pub struct EventSet(pub u64);
53+
54+
pub fn events_union(ids: &[EventId]) -> EventSet;
55+
pub fn set_events(tool: &ToolId, set: EventSet) -> PyResult<()>;
2856
```
2957

3058
### Writer Management
@@ -45,108 +73,165 @@ The tracer collects `sys.monitoring` events, converts them to `runtime_tracing`
4573
```
4674

4775
### Frame and Thread Tracking
48-
- Maintain a per-thread stack of frame identifiers to correlate `CALL`, `PY_START`, and returns.
76+
- Maintain a per-thread stack of activation identifiers to correlate `CALL`, `PY_START`, yields, and returns. Since `sys.monitoring` callbacks provide `CodeType` and offsets (not frames), we rely on the nesting order of events to track activations.
4977
```rs
50-
pub type FrameId = u64;
51-
pub struct ThreadState { pub stack: Vec<FrameId> }
78+
pub type ActivationId = u64;
79+
pub struct ThreadState { pub stack: Vec<ActivationId> }
5280
pub fn current_thread_state() -> &'static mut ThreadState;
5381
```
54-
- Map `frame` objects to internal IDs for cross-referencing events.
82+
- Associate activations with `CodeType` objects and instruction/line offsets as needed for cross-referencing, without depending on `PyFrameObject`.
5583
```rs
56-
pub struct FrameRegistry { next: FrameId, map: HashMap<*mut PyFrameObject, FrameId> }
57-
pub fn intern_frame(reg: &mut FrameRegistry, frame: *mut PyFrameObject) -> FrameId;
84+
pub struct Activation {
85+
pub id: ActivationId,
86+
// Hold a GIL-independent handle to the CodeType object.
87+
// Access required attributes via PyO3 attribute lookup (getattr) under the GIL.
88+
pub code: PyObject,
89+
}
5890
```
59-
- Record thread start/end events when a new thread registers callbacks.
91+
- Record thread start/end events when a thread first emits a monitoring event and when it finishes.
6092
```rs
6193
pub fn on_thread_start(thread_id: u64);
6294
pub fn on_thread_stop(thread_id: u64);
6395
```
6496

97+
### Code Object Access Strategy (no reliance on PyCodeObject internals)
98+
- Rationale: PyO3 exposes `ffi::PyCodeObject` as an opaque type. Instead of touching its unstable layout, treat code objects as generic Python objects and access only stable Python-level attributes via PyO3's `getattr` on `&PyAny`.
99+
```rs
100+
use pyo3::{prelude::*, types::PyAny};
101+
102+
#[derive(Clone)]
103+
pub struct CodeInfo {
104+
pub filename: String,
105+
pub qualname: String,
106+
pub firstlineno: u32,
107+
pub flags: u32,
108+
}
109+
110+
/// Stable identity for a code object during its lifetime.
111+
/// Uses the object's address while GIL-held; equivalent to Python's id(code).
112+
pub fn code_id(py: Python<'_>, code: &PyAny) -> usize {
113+
code.as_ptr() as usize
114+
}
115+
116+
/// Extract just the attributes we need, via Python attribute access.
117+
pub fn extract_code_info(py: Python<'_>, code: &PyAny) -> PyResult<CodeInfo> {
118+
let filename: String = code.getattr("co_filename")?.extract()?;
119+
// Prefer co_qualname if present, else fallback to co_name
120+
let qualname: String = match code.getattr("co_qualname") {
121+
Ok(q) => q.extract()?,
122+
Err(_) => code.getattr("co_name")?.extract()?,
123+
};
124+
let firstlineno: u32 = code.getattr("co_firstlineno")?.extract()?;
125+
let flags: u32 = code.getattr("co_flags")?.extract()?;
126+
Ok(CodeInfo { filename, qualname, firstlineno, flags })
127+
}
128+
129+
/// Cache minimal info to avoid repeated getattr and to assign stable IDs.
130+
pub struct CodeRegistry {
131+
pub map: std::collections::HashMap<usize, CodeInfo>,
132+
}
133+
134+
impl CodeRegistry {
135+
pub fn new() -> Self { Self { map: Default::default() } }
136+
pub fn intern(&mut self, py: Python<'_>, code: &PyAny) -> PyResult<usize> {
137+
let id = code_id(py, code);
138+
if !self.map.contains_key(&id) {
139+
let info = extract_code_info(py, code)?;
140+
self.map.insert(id, info);
141+
}
142+
Ok(id)
143+
}
144+
}
145+
```
146+
- Event handler inputs use `PyObject` for the `code` parameter. Borrow to `&PyAny` with `let code = code.bind(py);` when needed, then consult `CodeRegistry`.
147+
- For line numbers: rely on the `LINE` event’s provided `line_number`. If instruction offsets need mapping, call `code.getattr("co_lines")()?.call0()?` and iterate lazily; avoid caching unless necessary.
148+
65149
## Event Handling
66150

67151
Each bullet below represents a low-level operation translating a single `sys.monitoring` event into the `runtime_tracing` stream.
68152

69153
### Control Flow
70-
- **PY_START** – Create a `Function` event for the code object and push a new frame ID onto the thread's stack.
154+
- **PY_START** – Create a `Function` event for the code object and push a new activation ID onto the thread's stack.
71155
```rs
72-
pub fn on_py_start(frame: *mut PyFrameObject);
156+
pub fn on_py_start(code: PyObject, instruction_offset: i32);
73157
```
74-
- **PY_RESUME** – Emit an `Event` log noting resumption and update the current frame's state.
158+
- **PY_RESUME** – Emit an `Event` log noting resumption and update the current activation's state.
75159
```rs
76-
pub fn on_py_resume(frame: *mut PyFrameObject);
160+
pub fn on_py_resume(code: PyObject, instruction_offset: i32);
77161
```
78-
- **PY_RETURN** – Pop the frame ID, write a `Return` event with the value (if retrievable), and link to the caller.
162+
- **PY_RETURN** – Pop the activation ID, write a `Return` event with the value (if retrievable), and link to the caller.
79163
```rs
80-
pub struct ReturnRecord { pub frame: FrameId, pub value: Option<ValueRecord> }
81-
pub fn on_py_return(frame: *mut PyFrameObject, value: *mut PyObject);
164+
pub struct ReturnRecord { pub activation: ActivationId, pub value: Option<ValueRecord> }
165+
pub fn on_py_return(code: PyObject, instruction_offset: i32, retval: *mut PyObject);
82166
```
83-
- **PY_YIELD** – Record a `Return` event flagged as a yield and keep the frame on the stack for later resumes.
167+
- **PY_YIELD** – Record a `Return` event flagged as a yield and keep the activation on the stack for later resumes.
84168
```rs
85-
pub fn on_py_yield(frame: *mut PyFrameObject, value: *mut PyObject);
169+
pub fn on_py_yield(code: PyObject, instruction_offset: i32, retval: *mut PyObject);
86170
```
87-
- **STOP_ITERATION** – Emit an `Event` indicating iteration exhaustion for the current frame.
171+
- **STOP_ITERATION** – Emit an `Event` indicating iteration exhaustion for the current activation.
88172
```rs
89-
pub fn on_stop_iteration(frame: *mut PyFrameObject);
173+
pub fn on_stop_iteration(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
90174
```
91175
- **PY_UNWIND** – Mark the beginning of stack unwinding and note the target handler in an `Event`.
92176
```rs
93-
pub fn on_py_unwind(frame: *mut PyFrameObject);
177+
pub fn on_py_unwind(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
94178
```
95179
- **PY_THROW** – Emit an `Event` describing the thrown value and the target generator/coroutine.
96180
```rs
97-
pub fn on_py_throw(frame: *mut PyFrameObject, value: *mut PyObject);
181+
pub fn on_py_throw(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
98182
```
99183
- **RERAISE** – Log a re-raise event referencing the original exception.
100184
```rs
101-
pub fn on_reraise(frame: *mut PyFrameObject, exc: *mut PyObject);
185+
pub fn on_reraise(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
102186
```
103187

104188
### Call and Line Tracking
105-
- **CALL** – Record a `Call` event, capturing argument values and the callee's `Function` ID.
189+
- **CALL** – Record a `Call` event, capturing the `callable` and the first argument if available (`arg0` as provided by `sys.monitoring`), and associate a new activation.
106190
```rs
107-
pub fn on_call(callee: *mut PyObject, args: &PyTupleObject) -> FrameId;
191+
pub fn on_call(code: PyObject, instruction_offset: i32, callable: *mut PyObject, arg0: Option<*mut PyObject>) -> ActivationId;
108192
```
109193
- **LINE** – Write a `Step` event with current path and line number; ensure the path is registered.
110194
```rs
111-
pub fn on_line(frame: *mut PyFrameObject, lineno: u32);
195+
pub fn on_line(code: PyObject, line_number: u32);
112196
```
113-
- **INSTRUCTION** – Optionally emit a fine-grained `Event` containing the opcode name for detailed traces.
197+
- **INSTRUCTION** – Optionally emit a fine-grained `Event` keyed by `instruction_offset`. Opcode names can be derived from the `CodeType` if desired.
114198
```rs
115-
pub fn on_instruction(frame: *mut PyFrameObject, opcode: u8);
199+
pub fn on_instruction(code: PyObject, instruction_offset: i32);
116200
```
117201
- **JUMP** – Append an `Event` describing the jump target offset for control-flow visualization.
118202
```rs
119-
pub fn on_jump(frame: *mut PyFrameObject, target: u32);
203+
pub fn on_jump(code: PyObject, instruction_offset: i32, destination_offset: i32);
120204
```
121-
- **BRANCH** – Record an `Event` with branch outcome (taken or not) to aid coverage analysis.
205+
- **BRANCH** – Record an `Event` with `destination_offset`; whether the branch was taken can be inferred by comparing to the fallthrough offset.
122206
```rs
123-
pub fn on_branch(frame: *mut PyFrameObject, taken: bool);
207+
pub fn on_branch(code: PyObject, instruction_offset: i32, destination_offset: i32);
124208
```
209+
_Note_: Current runtime_tracing doesn't support branching events, but instead relies on AST tree-sitter analysis. So for the initial version we will ignore them and can add support after modifications to the tracing format.
125210

126211
### Exception Lifecycle
127212
- **RAISE** – Emit an `Event` containing exception type and message when raised.
128213
```rs
129-
pub fn on_raise(frame: *mut PyFrameObject, exc: *mut PyObject);
214+
pub fn on_raise(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
130215
```
131216
- **EXCEPTION_HANDLED** – Log an `Event` marking when an exception is caught.
132217
```rs
133-
pub fn on_exception_handled(frame: *mut PyFrameObject);
218+
pub fn on_exception_handled(code: PyObject, instruction_offset: i32, exception: *mut PyObject);
134219
```
135220

136221
### C API Boundary
137-
- **C_RETURN** – On returning from a C function, emit a `Return` event tagged as foreign and include result summary.
222+
- **C_RETURN** – On returning from a C function, emit a `Return` event tagged as foreign. Note: `sys.monitoring` does not provide the result object for `C_RETURN`.
138223
```rs
139-
pub fn on_c_return(func: *mut PyObject, result: *mut PyObject);
224+
pub fn on_c_return(code: PyObject, instruction_offset: i32, callable: *mut PyObject, arg0: Option<*mut PyObject>);
140225
```
141-
- **C_RAISE** – When a C function raises, record an `Event` with the exception info and current frame ID.
226+
- **C_RAISE** – When a C function raises, record an `Event` that a C-level callable raised. Note: `sys.monitoring` does not pass the exception object for `C_RAISE`.
142227
```rs
143-
pub fn on_c_raise(func: *mut PyObject, exc: *mut PyObject);
228+
pub fn on_c_raise(code: PyObject, instruction_offset: i32, callable: *mut PyObject, arg0: Option<*mut PyObject>);
144229
```
145230

146231
### No Events
147232
- **NO_EVENTS** – Special constant; used only to disable monitoring. No runtime event is produced.
148233
```rs
149-
pub const NO_EVENTS: u64 = 0;
234+
pub const NO_EVENTS: EventSet = EventSet(0);
150235
```
151236

152237
## Metadata and File Capture

0 commit comments

Comments
 (0)