Skip to content

Commit 7fa3748

Browse files
committed
expand python tracer design with implementation outline
1 parent 26e18d4 commit 7fa3748

File tree

1 file changed

+142
-0
lines changed

1 file changed

+142
-0
lines changed

docs/py-design-001.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,69 +10,211 @@ The tracer collects `sys.monitoring` events, converts them to `runtime_tracing`
1010

1111
### Tool Initialization
1212
- Acquire a tool identifier via `sys.monitoring.use_tool_id`; store it for the lifetime of the tracer.
13+
```rs
14+
pub const MONITORING_TOOL_NAME: &str = "codetracer";
15+
pub struct ToolId { pub id: u8 }
16+
pub fn acquire_tool_id() -> PyResult<ToolId>;
17+
```
1318
- Register one callback per event using `sys.monitoring.register_callback`.
19+
```rs
20+
pub enum MonitoringEvent { PyStart, PyResume, PyReturn, PyYield, StopIteration, PyUnwind, PyThrow, Reraise, Call, Line, Instruction, Jump, Branch, Raise, ExceptionHandled, CReturn, CRaise }
21+
pub type CallbackFn = unsafe extern "C" fn(event: MonitoringEvent, frame: *mut PyFrameObject);
22+
pub fn register_callback(tool: &ToolId, event: MonitoringEvent, cb: CallbackFn);
23+
```
1424
- Enable all desired events by bitmask with `sys.monitoring.set_events`.
25+
```rs
26+
pub const ALL_EVENTS_MASK: u64 = 0xffff;
27+
pub fn enable_events(tool: &ToolId, mask: u64);
28+
```
1529

1630
### Writer Management
1731
- Open a `runtime_tracing` writer (`trace.json` or `trace.bin`) during `start_tracing`.
32+
```rs
33+
pub enum OutputFormat { Json, Binary }
34+
pub struct TraceWriter { pub format: OutputFormat }
35+
pub fn start_tracing(path: &Path, format: OutputFormat) -> io::Result<TraceWriter>;
36+
```
1837
- Expose methods to append metadata and file copies using existing `runtime_tracing` helpers.
38+
```rs
39+
pub fn append_metadata(writer: &mut TraceWriter, meta: &TraceMetadata);
40+
pub fn copy_source_file(writer: &mut TraceWriter, path: &Path) -> io::Result<()>;
41+
```
1942
- Flush and close the writer when tracing stops.
43+
```rs
44+
pub fn stop_tracing(writer: TraceWriter) -> io::Result<()>;
45+
```
2046

2147
### Frame and Thread Tracking
2248
- Maintain a per-thread stack of frame identifiers to correlate `CALL`, `PY_START`, and returns.
49+
```rs
50+
pub type FrameId = u64;
51+
pub struct ThreadState { pub stack: Vec<FrameId> }
52+
pub fn current_thread_state() -> &'static mut ThreadState;
53+
```
2354
- Map `frame` objects to internal IDs for cross-referencing events.
55+
```rs
56+
pub struct FrameRegistry { next: FrameId, map: HashMap<*mut PyFrameObject, FrameId> }
57+
pub fn intern_frame(reg: &mut FrameRegistry, frame: *mut PyFrameObject) -> FrameId;
58+
```
2459
- Record thread start/end events when a new thread registers callbacks.
60+
```rs
61+
pub fn on_thread_start(thread_id: u64);
62+
pub fn on_thread_stop(thread_id: u64);
63+
```
2564

2665
## Event Handling
2766

2867
Each bullet below represents a low-level operation translating a single `sys.monitoring` event into the `runtime_tracing` stream.
2968

3069
### Control Flow
3170
- **PY_START** – Create a `Function` event for the code object and push a new frame ID onto the thread's stack.
71+
```rs
72+
pub fn on_py_start(frame: *mut PyFrameObject);
73+
```
3274
- **PY_RESUME** – Emit an `Event` log noting resumption and update the current frame's state.
75+
```rs
76+
pub fn on_py_resume(frame: *mut PyFrameObject);
77+
```
3378
- **PY_RETURN** – Pop the frame ID, write a `Return` event with the value (if retrievable), and link to the caller.
79+
```rs
80+
pub struct ReturnRecord { pub frame: FrameId, pub value: Option<ValueRecord> }
81+
pub fn on_py_return(frame: *mut PyFrameObject, value: *mut PyObject);
82+
```
3483
- **PY_YIELD** – Record a `Return` event flagged as a yield and keep the frame on the stack for later resumes.
84+
```rs
85+
pub fn on_py_yield(frame: *mut PyFrameObject, value: *mut PyObject);
86+
```
3587
- **STOP_ITERATION** – Emit an `Event` indicating iteration exhaustion for the current frame.
88+
```rs
89+
pub fn on_stop_iteration(frame: *mut PyFrameObject);
90+
```
3691
- **PY_UNWIND** – Mark the beginning of stack unwinding and note the target handler in an `Event`.
92+
```rs
93+
pub fn on_py_unwind(frame: *mut PyFrameObject);
94+
```
3795
- **PY_THROW** – Emit an `Event` describing the thrown value and the target generator/coroutine.
96+
```rs
97+
pub fn on_py_throw(frame: *mut PyFrameObject, value: *mut PyObject);
98+
```
3899
- **RERAISE** – Log a re-raise event referencing the original exception.
100+
```rs
101+
pub fn on_reraise(frame: *mut PyFrameObject, exc: *mut PyObject);
102+
```
39103

40104
### Call and Line Tracking
41105
- **CALL** – Record a `Call` event, capturing argument values and the callee's `Function` ID.
106+
```rs
107+
pub fn on_call(callee: *mut PyObject, args: &PyTupleObject) -> FrameId;
108+
```
42109
- **LINE** – Write a `Step` event with current path and line number; ensure the path is registered.
110+
```rs
111+
pub fn on_line(frame: *mut PyFrameObject, lineno: u32);
112+
```
43113
- **INSTRUCTION** – Optionally emit a fine-grained `Event` containing the opcode name for detailed traces.
114+
```rs
115+
pub fn on_instruction(frame: *mut PyFrameObject, opcode: u8);
116+
```
44117
- **JUMP** – Append an `Event` describing the jump target offset for control-flow visualization.
118+
```rs
119+
pub fn on_jump(frame: *mut PyFrameObject, target: u32);
120+
```
45121
- **BRANCH** – Record an `Event` with branch outcome (taken or not) to aid coverage analysis.
122+
```rs
123+
pub fn on_branch(frame: *mut PyFrameObject, taken: bool);
124+
```
46125

47126
### Exception Lifecycle
48127
- **RAISE** – Emit an `Event` containing exception type and message when raised.
128+
```rs
129+
pub fn on_raise(frame: *mut PyFrameObject, exc: *mut PyObject);
130+
```
49131
- **EXCEPTION_HANDLED** – Log an `Event` marking when an exception is caught.
132+
```rs
133+
pub fn on_exception_handled(frame: *mut PyFrameObject);
134+
```
50135

51136
### C API Boundary
52137
- **C_RETURN** – On returning from a C function, emit a `Return` event tagged as foreign and include result summary.
138+
```rs
139+
pub fn on_c_return(func: *mut PyObject, result: *mut PyObject);
140+
```
53141
- **C_RAISE** – When a C function raises, record an `Event` with the exception info and current frame ID.
142+
```rs
143+
pub fn on_c_raise(func: *mut PyObject, exc: *mut PyObject);
144+
```
54145

55146
### No Events
56147
- **NO_EVENTS** – Special constant; used only to disable monitoring. No runtime event is produced.
148+
```rs
149+
pub const NO_EVENTS: u64 = 0;
150+
```
57151

58152
## Metadata and File Capture
59153
- Collect the working directory, program name, and arguments and store them in `trace_metadata.json`.
154+
```rs
155+
pub struct TraceMetadata { pub cwd: PathBuf, pub program: String, pub args: Vec<String> }
156+
pub fn write_metadata(writer: &mut TraceWriter, meta: &TraceMetadata);
157+
```
60158
- Track every file path referenced; copy each into the trace directory under `files/`.
159+
```rs
160+
pub fn track_file(writer: &mut TraceWriter, path: &Path) -> io::Result<()>;
161+
```
61162
- Record `VariableName`, `Type`, and `Value` entries when variables are inspected or logged.
163+
```rs
164+
pub struct VariableRecord { pub name: String, pub ty: TypeId, pub value: ValueRecord }
165+
pub fn record_variable(writer: &mut TraceWriter, rec: VariableRecord);
166+
```
62167

63168
## Value Translation and Recording
64169
- Maintain a type registry that maps Python `type` objects to `runtime_tracing` `Type` entries and assigns new `type_id` values on first encounter.
170+
```rs
171+
pub type TypeId = u32;
172+
pub type ValueId = u64;
173+
pub enum ValueRecord { Int(i64), Float(f64), Bool(bool), None, Str(String), Raw(Vec<u8>), Sequence(Vec<ValueRecord>), Tuple(Vec<ValueRecord>), Struct(Vec<(String, ValueRecord)>), Reference(ValueId) }
174+
pub struct TypeRegistry { next: TypeId, map: HashMap<*mut PyTypeObject, TypeId> }
175+
pub fn intern_type(reg: &mut TypeRegistry, ty: *mut PyTypeObject) -> TypeId;
176+
```
65177
- Convert primitives (`int`, `float`, `bool`, `None`, `str`) directly to their corresponding `ValueRecord` variants.
178+
```rs
179+
pub fn encode_primitive(obj: *mut PyObject) -> Option<ValueRecord>;
180+
```
66181
- Encode `bytes` and `bytearray` as `Raw` records containing base64 text to preserve binary data.
182+
```rs
183+
pub fn encode_bytes(obj: *mut PyObject) -> ValueRecord;
184+
```
67185
- Represent lists and sets as `Sequence` records and tuples as `Tuple` records, converting each element recursively.
186+
```rs
187+
pub fn encode_sequence(iter: &PySequence) -> ValueRecord;
188+
pub fn encode_tuple(tuple: &PyTupleObject) -> ValueRecord;
189+
```
68190
- Serialize dictionaries as a `Sequence` of two-element `Tuple` records for key/value pairs to avoid fixed field layouts.
191+
```rs
192+
pub fn encode_dict(dict: &PyDictObject) -> ValueRecord;
193+
```
69194
- For objects with accessible attributes, emit a `Struct` record with sorted field names; fall back to `Raw` with `repr(obj)` when inspection is unsafe.
195+
```rs
196+
pub fn encode_object(obj: *mut PyObject) -> ValueRecord;
197+
```
70198
- Track object identities to detect cycles and reuse `Reference` records with `id(obj)` for repeated structures.
199+
```rs
200+
pub struct SeenSet { map: HashMap<usize, ValueId> }
201+
pub fn record_reference(seen: &mut SeenSet, obj: *mut PyObject) -> Option<ValueRecord>;
202+
```
71203

72204
## Shutdown
73205
- On `stop_tracing`, call `sys.monitoring.set_events` with `NO_EVENTS` for the tool ID.
206+
```rs
207+
pub fn disable_events(tool: &ToolId);
208+
```
74209
- Unregister callbacks and free the tool ID with `sys.monitoring.free_tool_id`.
210+
```rs
211+
pub fn unregister_callbacks(tool: ToolId);
212+
pub fn free_tool_id(tool: ToolId);
213+
```
75214
- Close the writer and ensure all buffered events are flushed to disk.
215+
```rs
216+
pub fn finalize(writer: TraceWriter) -> io::Result<()>;
217+
```
76218

77219
## Current Limitations
78220
- **No structured support for threads or async tasks** – the trace format lacks explicit identifiers for concurrent execution.

0 commit comments

Comments
 (0)