Skip to content

Commit 8f5494a

Browse files
committed
docs: describe global registry for code objects
1 parent 1178d0d commit 8f5494a

File tree

2 files changed

+42
-14
lines changed

2 files changed

+42
-14
lines changed

.agents/tasks/2025/08/21-0939-codetype-interface

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,6 @@ According to the PyO3 documentation it is preferred to use `Bound<'_, T>` instea
3535

3636
Also please add usage examples to the design documentation
3737
--- FOLLOW UP TASK ---
38-
Implement the CodeObjectWrapper as designed. Update the Tracer trait as well as the callback_xxx functions accordingly. Write a comprehensive unit tests for CodeObjectWrapper.
38+
Implement the CodeObjectWrapper as designed. Update the Tracer trait as well as the callback_xxx functions accordingly. Write a comprehensive unit tests for CodeObjectWrapper.
39+
--- FOLLOW UP TASK ---
40+
There is an issue in the current implementation. We don't use caching effectively, since we create a new CodeObjectWrapper at each callback_xxx call. We need a global cache, probably keyed by the code object id. Propose design changes and update the design documents. Don't implement the changes themselves before I approve them.

design-docs/code-object.md

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,34 @@ impl CodeObjectWrapper {
6666
}
6767
```
6868

69+
### Global registry
70+
71+
To avoid constructing a new wrapper for every tracing event, a global cache
72+
stores `CodeObjectWrapper` instances keyed by their stable `id`:
73+
74+
```rs
75+
pub struct CodeObjectRegistry {
76+
map: DashMap<usize, Arc<CodeObjectWrapper>>,
77+
}
78+
79+
impl CodeObjectRegistry {
80+
pub fn get_or_insert(
81+
&self,
82+
py: Python<'_>,
83+
code: &Bound<'_, PyCode>,
84+
) -> Arc<CodeObjectWrapper>;
85+
86+
/// Optional explicit removal for long‑running processes.
87+
pub fn remove(&self, id: usize);
88+
}
89+
```
90+
91+
`CodeObjectWrapper::new` remains available, but production code is expected to
92+
obtain instances via `CodeObjectRegistry::get_or_insert` so each unique code
93+
object is wrapped only once. The registry is designed to be thread‑safe
94+
(`DashMap`) and the wrappers are reference counted (`Arc`) so multiple threads
95+
can hold references without additional locking.
96+
6997
### Trait Integration
7098

7199
The `Tracer` trait will be adjusted so every callback receives `&CodeObjectWrapper` instead of a generic `&Bound<'_, PyAny>`:
@@ -78,38 +106,36 @@ fn on_py_start(&mut self, py: Python<'_>, code: &CodeObjectWrapper, offset: i32)
78106

79107
## Usage Examples
80108

81-
### Constructing the wrapper inside a tracer
109+
### Retrieving wrappers from the global registry
82110

83111
```rs
112+
static CODE_REGISTRY: Lazy<CodeObjectRegistry> = Lazy::new(CodeObjectRegistry::default);
113+
84114
fn on_line(&mut self, py: Python<'_>, code: &Bound<'_, PyCode>, lineno: u32) {
85-
let wrapper = CodeObjectWrapper::new(py, code);
115+
let wrapper = CODE_REGISTRY.get_or_insert(py, code);
86116
let filename = wrapper.filename(py).unwrap_or("<unknown>");
87117
eprintln!("{}:{}", filename, lineno);
88118
}
89119
```
90120

91-
### Reusing a cached wrapper
92-
93-
```rs
94-
let wrapper = CodeObjectWrapper::new(py, code);
95-
cache.insert(wrapper.id(), wrapper.clone());
96-
97-
if let Some(saved) = cache.get(&wrapper.id()) {
98-
let qualname = saved.qualname(py)?;
99-
println!("qualified name: {}", qualname);
100-
}
101-
```
121+
Once cached, subsequent callbacks referencing the same `CodeType` will reuse the
122+
existing wrapper without recomputing any attributes.
102123

103124
## Performance Considerations
104125
- `Py<PyCode>` allows cloning the wrapper without holding the GIL, enabling cheap event propagation.
105126
- Methods bind the owned reference to `Bound<'py, PyCode>` on demand, following PyO3's `Bound`‑first guidance and avoiding accidental `Py` clones.
106127
- Fields are loaded lazily and stored inside `OnceCell` containers to avoid repeated attribute lookups.
107128
- `line_for_offset` memoizes the full line table the first time it is requested; subsequent calls perform an in‑memory binary search.
108129
- Storing strings and small integers directly in the cache eliminates conversion cost on hot paths.
130+
- A global `CodeObjectRegistry` ensures that wrapper construction and attribute
131+
discovery happen at most once per `CodeType`.
109132

110133
## Open Questions
111134
- Additional attributes such as `co_consts` or `co_varnames` may be required for richer debugging features; these can be added later as new `OnceCell` fields.
112135
- Thread‑safety requirements may necessitate wrapping the cache in `UnsafeCell` or providing internal mutability strategies compatible with `Send`/`Sync`.
136+
- The registry currently grows unbounded; strategies for eviction or weak
137+
references may be needed for long‑running processes that compile many
138+
transient code objects.
113139

114140
## References
115141
- [Python `CodeType` objects](https://docs.python.org/3/reference/datamodel.html#code-objects)

0 commit comments

Comments
 (0)