Skip to content

Commit 41e0c06

Browse files
committed
docs: describe global registry for code objects
1 parent 48849aa commit 41e0c06

File tree

4 files changed

+305
-14
lines changed

4 files changed

+305
-14
lines changed

.agents/tasks/2025/08/21-0939-codetype-interface

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,6 @@ According to the PyO3 documentation it is preferred to use `Bound<'_, T>` instea
3535

3636
Also please add usage examples to the design documentation
3737
--- FOLLOW UP TASK ---
38-
Implement the CodeObjectWrapper as designed. Update the Tracer trait as well as the callback_xxx functions accordingly. Write a comprehensive unit tests for CodeObjectWrapper.
38+
Implement the CodeObjectWrapper as designed. Update the Tracer trait as well as the callback_xxx functions accordingly. Write a comprehensive unit tests for CodeObjectWrapper.
39+
--- FOLLOW UP TASK ---
40+
There is an issue in the current implementation. We don't use caching effectively, since we create a new CodeObjectWrapper at each callback_xxx call. We need a global cache, probably keyed by the code object id. Propose design changes and update the design documents. Don't implement the changes themselves before I approve them.

design-docs/#code-object.md#

Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
<<<<<<< Conflict 1 of 1
2+
%%%%%%% Changes from base to side #1
3+
+# Code Object Wrapper Design
4+
+
5+
+## Overview
6+
+
7+
+The Python Monitoring API delivers a generic `CodeType` object to every tracing callback. The current `Tracer` trait surfaces this object as `&Bound<'_, PyAny>`, forcing every implementation to perform attribute lookups and type conversions manually. This document proposes a `CodeObjectWrapper` type that exposes a stable, typed interface to the underlying code object while minimizing per-event overhead.
8+
+
9+
+## Goals
10+
+- Provide a strongly typed API for common `CodeType` attributes needed by tracers and recorders.
11+
+- Ensure lookups are cheap by caching values and avoiding repeated Python attribute access.
12+
+- Maintain a stable identity for each code object to correlate events across callbacks.
13+
+- Avoid relying on the unstable `PyCodeObject` layout from the C API.
14+
+
15+
+## Non-Goals
16+
+- Full re‑implementation of every `CodeType` attribute. Only the fields required for tracing and time‑travel debugging are exposed.
17+
+- Direct mutation of `CodeType` objects. The wrapper offers read‑only access.
18+
+
19+
+## Proposed API
20+
+
21+
+```rs
22+
+pub struct CodeObjectWrapper {
23+
+ /// Owned reference to the Python `CodeType` object.
24+
+ /// Stored as `Py<PyCode>` so it can be held outside the GIL.
25+
+ obj: Py<PyCode>,
26+
+ /// Stable identity equivalent to `id(code)`.
27+
+ id: usize,
28+
+ /// Lazily populated cache for expensive lookups.
29+
+ cache: CodeObjectCache,
30+
+}
31+
+
32+
+pub struct CodeObjectCache {
33+
+ filename: OnceCell<String>,
34+
+ qualname: OnceCell<String>,
35+
+ firstlineno: OnceCell<u32>,
36+
+ argcount: OnceCell<u16>,
37+
+ flags: OnceCell<u32>,
38+
+ /// Mapping of instruction offsets to line numbers.
39+
+ lines: OnceCell<Vec<LineEntry>>,
40+
+}
41+
+
42+
+pub struct LineEntry {
43+
+ pub offset: u32,
44+
+ pub line: u32,
45+
+}
46+
+
47+
+impl CodeObjectWrapper {
48+
+ /// Construct from a `CodeType` object. Computes `id` eagerly.
49+
+ pub fn new(py: Python<'_>, obj: &Bound<'_, PyCode>) -> Self;
50+
+
51+
+ /// Borrow the owned `Py<PyCode>` as a `Bound<'py, PyCode>`.
52+
+ /// This follows PyO3's recommendation to prefer `Bound<'_, T>` over `Py<T>`
53+
+ /// for object manipulation.
54+
+ pub fn as_bound<'py>(&'py self, py: Python<'py>) -> Bound<'py, PyCode>;
55+
+
56+
+ /// Accessors fetch from the cache or perform a one‑time lookup under the GIL.
57+
+ pub fn filename<'py>(&'py self, py: Python<'py>) -> PyResult<&'py str>;
58+
+ pub fn qualname<'py>(&'py self, py: Python<'py>) -> PyResult<&'py str>;
59+
+ pub fn first_line(&self, py: Python<'_>) -> PyResult<u32>;
60+
+ pub fn arg_count(&self, py: Python<'_>) -> PyResult<u16>;
61+
+ pub fn flags(&self, py: Python<'_>) -> PyResult<u32>;
62+
+
63+
+ /// Return the source line for a given instruction offset using a binary search on `lines`.
64+
+ pub fn line_for_offset(&self, py: Python<'_>, offset: u32) -> PyResult<Option<u32>>;
65+
+
66+
+ /// Expose the stable identity for cross‑event correlation.
67+
+ pub fn id(&self) -> usize;
68+
+}
69+
+```
70+
+
71+
+### Trait Integration
72+
+
73+
+The `Tracer` trait will be adjusted so every callback receives `&CodeObjectWrapper` instead of a generic `&Bound<'_, PyAny>`:
74+
+
75+
+```rs
76+
+fn on_line(&mut self, py: Python<'_>, code: &CodeObjectWrapper, lineno: u32);
77+
+fn on_py_start(&mut self, py: Python<'_>, code: &CodeObjectWrapper, offset: i32);
78+
+// ...and similarly for the remaining callbacks.
79+
+```
80+
+
81+
+## Usage Examples
82+
+
83+
+### Constructing the wrapper inside a tracer
84+
+
85+
+```rs
86+
+fn on_line(&mut self, py: Python<'_>, code: &Bound<'_, PyCode>, lineno: u32) {
87+
+ let wrapper = CodeObjectWrapper::new(py, code);
88+
+ let filename = wrapper.filename(py).unwrap_or("<unknown>");
89+
+ eprintln!("{}:{}", filename, lineno);
90+
+}
91+
+```
92+
+
93+
+### Reusing a cached wrapper
94+
+
95+
+```rs
96+
+let wrapper = CodeObjectWrapper::new(py, code);
97+
+cache.insert(wrapper.id(), wrapper.clone());
98+
+
99+
+if let Some(saved) = cache.get(&wrapper.id()) {
100+
+ let qualname = saved.qualname(py)?;
101+
+ println!("qualified name: {}", qualname);
102+
+}
103+
+```
104+
+
105+
+## Performance Considerations
106+
+- `Py<PyCode>` allows cloning the wrapper without holding the GIL, enabling cheap event propagation.
107+
+- Methods bind the owned reference to `Bound<'py, PyCode>` on demand, following PyO3's `Bound`‑first guidance and avoiding accidental `Py` clones.
108+
+- Fields are loaded lazily and stored inside `OnceCell` containers to avoid repeated attribute lookups.
109+
+- `line_for_offset` memoizes the full line table the first time it is requested; subsequent calls perform an in‑memory binary search.
110+
+- Storing strings and small integers directly in the cache eliminates conversion cost on hot paths.
111+
+
112+
+## Open Questions
113+
+- Additional attributes such as `co_consts` or `co_varnames` may be required for richer debugging features; these can be added later as new `OnceCell` fields.
114+
+- Thread‑safety requirements may necessitate wrapping the cache in `UnsafeCell` or providing internal mutability strategies compatible with `Send`/`Sync`.
115+
+
116+
+## References
117+
+- [Python `CodeType` objects](https://docs.python.org/3/reference/datamodel.html#code-objects)
118+
+- [Python monitoring API](https://docs.python.org/3/library/sys.monitoring.html)
119+
+++++++ Contents of side #2
120+
# Code Object Wrapper Design
121+
122+
## Overview
123+
124+
The Python Monitoring API delivers a generic `CodeType` object to every tracing callback. The current `Tracer` trait surfaces this object as `&Bound<'_, PyAny>`, forcing every implementation to perform attribute lookups and type conversions manually. This document proposes a `CodeObjectWrapper` type that exposes a stable, typed interface to the underlying code object while minimizing per-event overhead.
125+
126+
## Goals
127+
- Provide a strongly typed API for common `CodeType` attributes needed by tracers and recorders.
128+
- Ensure lookups are cheap by caching values and avoiding repeated Python attribute access.
129+
- Maintain a stable identity for each code object to correlate events across callbacks.
130+
- Avoid relying on the unstable `PyCodeObject` layout from the C API.
131+
132+
## Non-Goals
133+
- Full re‑implementation of every `CodeType` attribute. Only the fields required for tracing and time‑travel debugging are exposed.
134+
- Direct mutation of `CodeType` objects. The wrapper offers read‑only access.
135+
136+
## Proposed API
137+
138+
```rs
139+
pub struct CodeObjectWrapper {
140+
/// Owned reference to the Python `CodeType` object.
141+
/// Stored as `Py<PyCode>` so it can be held outside the GIL.
142+
obj: Py<PyCode>,
143+
/// Stable identity equivalent to `id(code)`.
144+
id: usize,
145+
/// Lazily populated cache for expensive lookups.
146+
cache: CodeObjectCache,
147+
}
148+
149+
pub struct CodeObjectCache {
150+
filename: OnceCell<String>,
151+
qualname: OnceCell<String>,
152+
firstlineno: OnceCell<u32>,
153+
argcount: OnceCell<u16>,
154+
flags: OnceCell<u32>,
155+
/// Mapping of instruction offsets to line numbers.
156+
lines: OnceCell<Vec<LineEntry>>,
157+
}
158+
159+
pub struct LineEntry {
160+
pub offset: u32,
161+
pub line: u32,
162+
}
163+
164+
impl CodeObjectWrapper {
165+
/// Construct from a `CodeType` object. Computes `id` eagerly.
166+
pub fn new(py: Python<'_>, obj: &Bound<'_, PyCode>) -> Self;
167+
168+
/// Borrow the owned `Py<PyCode>` as a `Bound<'py, PyCode>`.
169+
/// This follows PyO3's recommendation to prefer `Bound<'_, T>` over `Py<T>`
170+
/// for object manipulation.
171+
pub fn as_bound<'py>(&'py self, py: Python<'py>) -> Bound<'py, PyCode>;
172+
173+
/// Accessors fetch from the cache or perform a one‑time lookup under the GIL.
174+
pub fn filename<'py>(&'py self, py: Python<'py>) -> PyResult<&'py str>;
175+
pub fn qualname<'py>(&'py self, py: Python<'py>) -> PyResult<&'py str>;
176+
pub fn first_line(&self, py: Python<'_>) -> PyResult<u32>;
177+
pub fn arg_count(&self, py: Python<'_>) -> PyResult<u16>;
178+
pub fn flags(&self, py: Python<'_>) -> PyResult<u32>;
179+
180+
/// Return the source line for a given instruction offset using a binary search on `lines`.
181+
pub fn line_for_offset(&self, py: Python<'_>, offset: u32) -> PyResult<Option<u32>>;
182+
183+
/// Expose the stable identity for cross‑event correlation.
184+
pub fn id(&self) -> usize;
185+
}
186+
```
187+
188+
### Global registry
189+
190+
To avoid constructing a new wrapper for every tracing event, a global cache
191+
stores `CodeObjectWrapper` instances keyed by their stable `id`:
192+
193+
```rs
194+
pub struct CodeObjectRegistry {
195+
map: DashMap<usize, Arc<CodeObjectWrapper>>,
196+
}
197+
198+
impl CodeObjectRegistry {
199+
pub fn get_or_insert(
200+
&self,
201+
py: Python<'_>,
202+
code: &Bound<'_, PyCode>,
203+
) -> Arc<CodeObjectWrapper>;
204+
205+
/// Optional explicit removal for long‑running processes.
206+
pub fn remove(&self, id: usize);
207+
}
208+
```
209+
210+
`CodeObjectWrapper::new` remains available, but production code is expected to
211+
obtain instances via `CodeObjectRegistry::get_or_insert` so each unique code
212+
object is wrapped only once. The registry is designed to be thread‑safe
213+
(`DashMap`) and the wrappers are reference counted (`Arc`) so multiple threads
214+
can hold references without additional locking.
215+
216+
### Trait Integration
217+
218+
The `Tracer` trait will be adjusted so every callback receives `&CodeObjectWrapper` instead of a generic `&Bound<'_, PyAny>`:
219+
220+
```rs
221+
fn on_line(&mut self, py: Python<'_>, code: &CodeObjectWrapper, lineno: u32);
222+
fn on_py_start(&mut self, py: Python<'_>, code: &CodeObjectWrapper, offset: i32);
223+
// ...and similarly for the remaining callbacks.
224+
```
225+
226+
## Usage Examples
227+
228+
### Retrieving wrappers from the global registry
229+
230+
```rs
231+
static CODE_REGISTRY: Lazy<CodeObjectRegistry> = Lazy::new(CodeObjectRegistry::default);
232+
233+
fn on_line(&mut self, py: Python<'_>, code: &Bound<'_, PyCode>, lineno: u32) {
234+
let wrapper = CODE_REGISTRY.get_or_insert(py, code);
235+
let filename = wrapper.filename(py).unwrap_or("<unknown>");
236+
eprintln!("{}:{}", filename, lineno);
237+
}
238+
```
239+
240+
Once cached, subsequent callbacks referencing the same `CodeType` will reuse the
241+
existing wrapper without recomputing any attributes.
242+
243+
## Performance Considerations
244+
- `Py<PyCode>` allows cloning the wrapper without holding the GIL, enabling cheap event propagation.
245+
- Methods bind the owned reference to `Bound<'py, PyCode>` on demand, following PyO3's `Bound`‑first guidance and avoiding accidental `Py` clones.
246+
- Fields are loaded lazily and stored inside `OnceCell` containers to avoid repeated attribute lookups.
247+
- `line_for_offset` memoizes the full line table the first time it is requested; subsequent calls perform an in‑memory binary search.
248+
- Storing strings and small integers directly in the cache eliminates conversion cost on hot paths.
249+
- A global `CodeObjectRegistry` ensures that wrapper construction and attribute
250+
discovery happen at most once per `CodeType`.
251+
252+
## Open Questions
253+
- Additional attributes such as `co_consts` or `co_varnames` may be required for richer debugging features; these can be added later as new `OnceCell` fields.
254+
- Thread‑safety requirements may necessitate wrapping the cache in `UnsafeCell` or providing internal mutability strategies compatible with `Send`/`Sync`.
255+
- The registry currently grows unbounded; strategies for eviction or weak
256+
references may be needed for long‑running processes that compile many
257+
transient code objects.
258+
259+
## References
260+
- [Python `CodeType` objects](https://docs.python.org/3/reference/datamodel.html#code-objects)
261+
- [Python monitoring API](https://docs.python.org/3/library/sys.monitoring.html)
262+
>>>>>>> Conflict 1 of 1 ends

design-docs/.#code-object.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

design-docs/code-object.md

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,34 @@ impl CodeObjectWrapper {
6666
}
6767
```
6868

69+
### Global registry
70+
71+
To avoid constructing a new wrapper for every tracing event, a global cache
72+
stores `CodeObjectWrapper` instances keyed by their stable `id`:
73+
74+
```rs
75+
pub struct CodeObjectRegistry {
76+
map: DashMap<usize, Arc<CodeObjectWrapper>>,
77+
}
78+
79+
impl CodeObjectRegistry {
80+
pub fn get_or_insert(
81+
&self,
82+
py: Python<'_>,
83+
code: &Bound<'_, PyCode>,
84+
) -> Arc<CodeObjectWrapper>;
85+
86+
/// Optional explicit removal for long‑running processes.
87+
pub fn remove(&self, id: usize);
88+
}
89+
```
90+
91+
`CodeObjectWrapper::new` remains available, but production code is expected to
92+
obtain instances via `CodeObjectRegistry::get_or_insert` so each unique code
93+
object is wrapped only once. The registry is designed to be thread‑safe
94+
(`DashMap`) and the wrappers are reference counted (`Arc`) so multiple threads
95+
can hold references without additional locking.
96+
6997
### Trait Integration
7098

7199
The `Tracer` trait will be adjusted so every callback receives `&CodeObjectWrapper` instead of a generic `&Bound<'_, PyAny>`:
@@ -78,38 +106,36 @@ fn on_py_start(&mut self, py: Python<'_>, code: &CodeObjectWrapper, offset: i32)
78106

79107
## Usage Examples
80108

81-
### Constructing the wrapper inside a tracer
109+
### Retrieving wrappers from the global registry
82110

83111
```rs
112+
static CODE_REGISTRY: Lazy<CodeObjectRegistry> = Lazy::new(CodeObjectRegistry::default);
113+
84114
fn on_line(&mut self, py: Python<'_>, code: &Bound<'_, PyCode>, lineno: u32) {
85-
let wrapper = CodeObjectWrapper::new(py, code);
115+
let wrapper = CODE_REGISTRY.get_or_insert(py, code);
86116
let filename = wrapper.filename(py).unwrap_or("<unknown>");
87117
eprintln!("{}:{}", filename, lineno);
88118
}
89119
```
90120

91-
### Reusing a cached wrapper
92-
93-
```rs
94-
let wrapper = CodeObjectWrapper::new(py, code);
95-
cache.insert(wrapper.id(), wrapper.clone());
96-
97-
if let Some(saved) = cache.get(&wrapper.id()) {
98-
let qualname = saved.qualname(py)?;
99-
println!("qualified name: {}", qualname);
100-
}
101-
```
121+
Once cached, subsequent callbacks referencing the same `CodeType` will reuse the
122+
existing wrapper without recomputing any attributes.
102123

103124
## Performance Considerations
104125
- `Py<PyCode>` allows cloning the wrapper without holding the GIL, enabling cheap event propagation.
105126
- Methods bind the owned reference to `Bound<'py, PyCode>` on demand, following PyO3's `Bound`‑first guidance and avoiding accidental `Py` clones.
106127
- Fields are loaded lazily and stored inside `OnceCell` containers to avoid repeated attribute lookups.
107128
- `line_for_offset` memoizes the full line table the first time it is requested; subsequent calls perform an in‑memory binary search.
108129
- Storing strings and small integers directly in the cache eliminates conversion cost on hot paths.
130+
- A global `CodeObjectRegistry` ensures that wrapper construction and attribute
131+
discovery happen at most once per `CodeType`.
109132

110133
## Open Questions
111134
- Additional attributes such as `co_consts` or `co_varnames` may be required for richer debugging features; these can be added later as new `OnceCell` fields.
112135
- Thread‑safety requirements may necessitate wrapping the cache in `UnsafeCell` or providing internal mutability strategies compatible with `Send`/`Sync`.
136+
- The registry currently grows unbounded; strategies for eviction or weak
137+
references may be needed for long‑running processes that compile many
138+
transient code objects.
113139

114140
## References
115141
- [Python `CodeType` objects](https://docs.python.org/3/reference/datamodel.html#code-objects)

0 commit comments

Comments
 (0)