|
| 1 | +# Python sys.monitoring Tracer Design |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document outlines the design for integrating Python's `sys.monitoring` API with the `runtime_tracing` format. The goal is to produce CodeTracer-compatible traces for Python programs without modifying the interpreter. |
| 6 | + |
| 7 | +The tracer collects `sys.monitoring` events, converts them to `runtime_tracing` events, and streams them to `trace.json`/`trace.bin` along with metadata and source snapshots. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +### Tool Initialization |
| 12 | +- Acquire a tool identifier via `sys.monitoring.use_tool_id`; store it for the lifetime of the tracer. |
| 13 | + ```rs |
| 14 | + pub const MONITORING_TOOL_NAME: &str = "codetracer"; |
| 15 | + pub struct ToolId { pub id: u8 } |
| 16 | + pub fn acquire_tool_id() -> PyResult<ToolId>; |
| 17 | + ``` |
| 18 | +- Register one callback per event using `sys.monitoring.register_callback`. |
| 19 | + ```rs |
| 20 | + pub enum MonitoringEvent { PyStart, PyResume, PyReturn, PyYield, StopIteration, PyUnwind, PyThrow, Reraise, Call, Line, Instruction, Jump, Branch, Raise, ExceptionHandled, CReturn, CRaise } |
| 21 | + pub type CallbackFn = unsafe extern "C" fn(event: MonitoringEvent, frame: *mut PyFrameObject); |
| 22 | + pub fn register_callback(tool: &ToolId, event: MonitoringEvent, cb: CallbackFn); |
| 23 | + ``` |
| 24 | +- Enable all desired events by bitmask with `sys.monitoring.set_events`. |
| 25 | + ```rs |
| 26 | + pub const ALL_EVENTS_MASK: u64 = 0xffff; |
| 27 | + pub fn enable_events(tool: &ToolId, mask: u64); |
| 28 | + ``` |
| 29 | + |
| 30 | +### Writer Management |
| 31 | +- Open a `runtime_tracing` writer (`trace.json` or `trace.bin`) during `start_tracing`. |
| 32 | + ```rs |
| 33 | + pub enum OutputFormat { Json, Binary } |
| 34 | + pub struct TraceWriter { pub format: OutputFormat } |
| 35 | + pub fn start_tracing(path: &Path, format: OutputFormat) -> io::Result<TraceWriter>; |
| 36 | + ``` |
| 37 | +- Expose methods to append metadata and file copies using existing `runtime_tracing` helpers. |
| 38 | + ```rs |
| 39 | + pub fn append_metadata(writer: &mut TraceWriter, meta: &TraceMetadata); |
| 40 | + pub fn copy_source_file(writer: &mut TraceWriter, path: &Path) -> io::Result<()>; |
| 41 | + ``` |
| 42 | +- Flush and close the writer when tracing stops. |
| 43 | + ```rs |
| 44 | + pub fn stop_tracing(writer: TraceWriter) -> io::Result<()>; |
| 45 | + ``` |
| 46 | + |
| 47 | +### Frame and Thread Tracking |
| 48 | +- Maintain a per-thread stack of frame identifiers to correlate `CALL`, `PY_START`, and returns. |
| 49 | + ```rs |
| 50 | + pub type FrameId = u64; |
| 51 | + pub struct ThreadState { pub stack: Vec<FrameId> } |
| 52 | + pub fn current_thread_state() -> &'static mut ThreadState; |
| 53 | + ``` |
| 54 | +- Map `frame` objects to internal IDs for cross-referencing events. |
| 55 | + ```rs |
| 56 | + pub struct FrameRegistry { next: FrameId, map: HashMap<*mut PyFrameObject, FrameId> } |
| 57 | + pub fn intern_frame(reg: &mut FrameRegistry, frame: *mut PyFrameObject) -> FrameId; |
| 58 | + ``` |
| 59 | +- Record thread start/end events when a new thread registers callbacks. |
| 60 | + ```rs |
| 61 | + pub fn on_thread_start(thread_id: u64); |
| 62 | + pub fn on_thread_stop(thread_id: u64); |
| 63 | + ``` |
| 64 | + |
| 65 | +## Event Handling |
| 66 | + |
| 67 | +Each bullet below represents a low-level operation translating a single `sys.monitoring` event into the `runtime_tracing` stream. |
| 68 | + |
| 69 | +### Control Flow |
| 70 | +- **PY_START** – Create a `Function` event for the code object and push a new frame ID onto the thread's stack. |
| 71 | + ```rs |
| 72 | + pub fn on_py_start(frame: *mut PyFrameObject); |
| 73 | + ``` |
| 74 | +- **PY_RESUME** – Emit an `Event` log noting resumption and update the current frame's state. |
| 75 | + ```rs |
| 76 | + pub fn on_py_resume(frame: *mut PyFrameObject); |
| 77 | + ``` |
| 78 | +- **PY_RETURN** – Pop the frame ID, write a `Return` event with the value (if retrievable), and link to the caller. |
| 79 | + ```rs |
| 80 | + pub struct ReturnRecord { pub frame: FrameId, pub value: Option<ValueRecord> } |
| 81 | + pub fn on_py_return(frame: *mut PyFrameObject, value: *mut PyObject); |
| 82 | + ``` |
| 83 | +- **PY_YIELD** – Record a `Return` event flagged as a yield and keep the frame on the stack for later resumes. |
| 84 | + ```rs |
| 85 | + pub fn on_py_yield(frame: *mut PyFrameObject, value: *mut PyObject); |
| 86 | + ``` |
| 87 | +- **STOP_ITERATION** – Emit an `Event` indicating iteration exhaustion for the current frame. |
| 88 | + ```rs |
| 89 | + pub fn on_stop_iteration(frame: *mut PyFrameObject); |
| 90 | + ``` |
| 91 | +- **PY_UNWIND** – Mark the beginning of stack unwinding and note the target handler in an `Event`. |
| 92 | + ```rs |
| 93 | + pub fn on_py_unwind(frame: *mut PyFrameObject); |
| 94 | + ``` |
| 95 | +- **PY_THROW** – Emit an `Event` describing the thrown value and the target generator/coroutine. |
| 96 | + ```rs |
| 97 | + pub fn on_py_throw(frame: *mut PyFrameObject, value: *mut PyObject); |
| 98 | + ``` |
| 99 | +- **RERAISE** – Log a re-raise event referencing the original exception. |
| 100 | + ```rs |
| 101 | + pub fn on_reraise(frame: *mut PyFrameObject, exc: *mut PyObject); |
| 102 | + ``` |
| 103 | + |
| 104 | +### Call and Line Tracking |
| 105 | +- **CALL** – Record a `Call` event, capturing argument values and the callee's `Function` ID. |
| 106 | + ```rs |
| 107 | + pub fn on_call(callee: *mut PyObject, args: &PyTupleObject) -> FrameId; |
| 108 | + ``` |
| 109 | +- **LINE** – Write a `Step` event with current path and line number; ensure the path is registered. |
| 110 | + ```rs |
| 111 | + pub fn on_line(frame: *mut PyFrameObject, lineno: u32); |
| 112 | + ``` |
| 113 | +- **INSTRUCTION** – Optionally emit a fine-grained `Event` containing the opcode name for detailed traces. |
| 114 | + ```rs |
| 115 | + pub fn on_instruction(frame: *mut PyFrameObject, opcode: u8); |
| 116 | + ``` |
| 117 | +- **JUMP** – Append an `Event` describing the jump target offset for control-flow visualization. |
| 118 | + ```rs |
| 119 | + pub fn on_jump(frame: *mut PyFrameObject, target: u32); |
| 120 | + ``` |
| 121 | +- **BRANCH** – Record an `Event` with branch outcome (taken or not) to aid coverage analysis. |
| 122 | + ```rs |
| 123 | + pub fn on_branch(frame: *mut PyFrameObject, taken: bool); |
| 124 | + ``` |
| 125 | + |
| 126 | +### Exception Lifecycle |
| 127 | +- **RAISE** – Emit an `Event` containing exception type and message when raised. |
| 128 | + ```rs |
| 129 | + pub fn on_raise(frame: *mut PyFrameObject, exc: *mut PyObject); |
| 130 | + ``` |
| 131 | +- **EXCEPTION_HANDLED** – Log an `Event` marking when an exception is caught. |
| 132 | + ```rs |
| 133 | + pub fn on_exception_handled(frame: *mut PyFrameObject); |
| 134 | + ``` |
| 135 | + |
| 136 | +### C API Boundary |
| 137 | +- **C_RETURN** – On returning from a C function, emit a `Return` event tagged as foreign and include result summary. |
| 138 | + ```rs |
| 139 | + pub fn on_c_return(func: *mut PyObject, result: *mut PyObject); |
| 140 | + ``` |
| 141 | +- **C_RAISE** – When a C function raises, record an `Event` with the exception info and current frame ID. |
| 142 | + ```rs |
| 143 | + pub fn on_c_raise(func: *mut PyObject, exc: *mut PyObject); |
| 144 | + ``` |
| 145 | + |
| 146 | +### No Events |
| 147 | +- **NO_EVENTS** – Special constant; used only to disable monitoring. No runtime event is produced. |
| 148 | + ```rs |
| 149 | + pub const NO_EVENTS: u64 = 0; |
| 150 | + ``` |
| 151 | + |
| 152 | +## Metadata and File Capture |
| 153 | +- Collect the working directory, program name, and arguments and store them in `trace_metadata.json`. |
| 154 | + ```rs |
| 155 | + pub struct TraceMetadata { pub cwd: PathBuf, pub program: String, pub args: Vec<String> } |
| 156 | + pub fn write_metadata(writer: &mut TraceWriter, meta: &TraceMetadata); |
| 157 | + ``` |
| 158 | +- Track every file path referenced; copy each into the trace directory under `files/`. |
| 159 | + ```rs |
| 160 | + pub fn track_file(writer: &mut TraceWriter, path: &Path) -> io::Result<()>; |
| 161 | + ``` |
| 162 | +- Record `VariableName`, `Type`, and `Value` entries when variables are inspected or logged. |
| 163 | + ```rs |
| 164 | + pub struct VariableRecord { pub name: String, pub ty: TypeId, pub value: ValueRecord } |
| 165 | + pub fn record_variable(writer: &mut TraceWriter, rec: VariableRecord); |
| 166 | + ``` |
| 167 | + |
| 168 | +## Value Translation and Recording |
| 169 | +- Maintain a type registry that maps Python `type` objects to `runtime_tracing` `Type` entries and assigns new `type_id` values on first encounter. |
| 170 | + ```rs |
| 171 | + pub type TypeId = u32; |
| 172 | + pub type ValueId = u64; |
| 173 | + pub enum ValueRecord { Int(i64), Float(f64), Bool(bool), None, Str(String), Raw(Vec<u8>), Sequence(Vec<ValueRecord>), Tuple(Vec<ValueRecord>), Struct(Vec<(String, ValueRecord)>), Reference(ValueId) } |
| 174 | + pub struct TypeRegistry { next: TypeId, map: HashMap<*mut PyTypeObject, TypeId> } |
| 175 | + pub fn intern_type(reg: &mut TypeRegistry, ty: *mut PyTypeObject) -> TypeId; |
| 176 | + ``` |
| 177 | +- Convert primitives (`int`, `float`, `bool`, `None`, `str`) directly to their corresponding `ValueRecord` variants. |
| 178 | + ```rs |
| 179 | + pub fn encode_primitive(obj: *mut PyObject) -> Option<ValueRecord>; |
| 180 | + ``` |
| 181 | +- Encode `bytes` and `bytearray` as `Raw` records containing base64 text to preserve binary data. |
| 182 | + ```rs |
| 183 | + pub fn encode_bytes(obj: *mut PyObject) -> ValueRecord; |
| 184 | + ``` |
| 185 | +- Represent lists and sets as `Sequence` records and tuples as `Tuple` records, converting each element recursively. |
| 186 | + ```rs |
| 187 | + pub fn encode_sequence(iter: &PySequence) -> ValueRecord; |
| 188 | + pub fn encode_tuple(tuple: &PyTupleObject) -> ValueRecord; |
| 189 | + ``` |
| 190 | +- Serialize dictionaries as a `Sequence` of two-element `Tuple` records for key/value pairs to avoid fixed field layouts. |
| 191 | + ```rs |
| 192 | + pub fn encode_dict(dict: &PyDictObject) -> ValueRecord; |
| 193 | + ``` |
| 194 | +- For objects with accessible attributes, emit a `Struct` record with sorted field names; fall back to `Raw` with `repr(obj)` when inspection is unsafe. |
| 195 | + ```rs |
| 196 | + pub fn encode_object(obj: *mut PyObject) -> ValueRecord; |
| 197 | + ``` |
| 198 | +- Track object identities to detect cycles and reuse `Reference` records with `id(obj)` for repeated structures. |
| 199 | + ```rs |
| 200 | + pub struct SeenSet { map: HashMap<usize, ValueId> } |
| 201 | + pub fn record_reference(seen: &mut SeenSet, obj: *mut PyObject) -> Option<ValueRecord>; |
| 202 | + ``` |
| 203 | + |
| 204 | +## Shutdown |
| 205 | +- On `stop_tracing`, call `sys.monitoring.set_events` with `NO_EVENTS` for the tool ID. |
| 206 | + ```rs |
| 207 | + pub fn disable_events(tool: &ToolId); |
| 208 | + ``` |
| 209 | +- Unregister callbacks and free the tool ID with `sys.monitoring.free_tool_id`. |
| 210 | + ```rs |
| 211 | + pub fn unregister_callbacks(tool: ToolId); |
| 212 | + pub fn free_tool_id(tool: ToolId); |
| 213 | + ``` |
| 214 | +- Close the writer and ensure all buffered events are flushed to disk. |
| 215 | + ```rs |
| 216 | + pub fn finalize(writer: TraceWriter) -> io::Result<()>; |
| 217 | + ``` |
| 218 | + |
| 219 | +## Current Limitations |
| 220 | +- **No structured support for threads or async tasks** – the trace format lacks explicit identifiers for concurrent execution. |
| 221 | + Distinguishing events emitted by different Python threads or `asyncio` tasks requires ad hoc `Event` entries, complicating |
| 222 | + analysis and preventing downstream tools from reasoning about scheduling. |
| 223 | +- **Generic `Event` log** – several `sys.monitoring` notifications like resume, unwind, and branch outcomes have no dedicated |
| 224 | + `runtime_tracing` variant. They must be encoded as free‑form `Event` logs, which reduces machine readability and hinders |
| 225 | + automation. |
| 226 | +- **Heavy value snapshots** – arguments and returns expect full `ValueRecord` structures. Serializing arbitrary Python objects is |
| 227 | + expensive and often degrades to lossy string dumps, limiting the visibility of rich runtime state. |
| 228 | +- **Append‑only path and function tables** – `runtime_tracing` assumes files and functions are discovered once and never change. |
| 229 | + Dynamically generated code (`eval`, REPL snippets) forces extra bookkeeping and cannot update earlier entries, making |
| 230 | + dynamic features awkward to trace. |
| 231 | +- **No built‑in compression or streaming** – traces are written as monolithic JSON or binary files. Long sessions quickly grow in |
| 232 | + size and cannot be streamed to remote consumers without additional tooling. |
| 233 | + |
| 234 | +## Future Extensions |
| 235 | +- Add filtering to enable subsets of events for performance-sensitive scenarios. |
| 236 | +- Support streaming traces over a socket for live debugging. |
0 commit comments