Skip to content

Commit 8400d79

Browse files
authored
Refactor module name resolution (#63)
This is the last PR involved with module-name resolution. We end-up relying on `__name__` to compute the module name.
2 parents 45ec79c + 1dc81d2 commit 8400d79

26 files changed

+667
-497
lines changed

codetracer-python-recorder/CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
99
- Balanced call-stack handling for generators, coroutines, and unwinding frames by subscribing to `PY_YIELD`, `PY_UNWIND`, `PY_RESUME`, and `PY_THROW`, mapping resume/throw events to `TraceWriter::register_call`, yield/unwind to `register_return`, and capturing `PY_THROW` arguments as `exception` using the existing value encoder. Added Python + Rust integration tests that drive `.send()`/`.throw()` on coroutines and generators to guarantee the trace stays balanced and that exception payloads are recorded.
1010

1111
### Changed
12-
- Module-level call events now use the actual dotted module name (e.g., `<my_pkg.mod>` or `<boto3.session>`) instead of the generic `<module>` label. `RuntimeTracer` derives the name via the shared module-identity helper, caches the result per code object, and falls back to `<module>` only for synthetic or nameless frames. Added Rust + Python tests plus README documentation covering the new semantics.
12+
- Module-level call events now prefer the frame's `__name__`, fall back to filter hints, `sys.path`, and package markers, and no longer depend on the legacy resolver/cache. The globals-derived naming flag now defaults to enabled so direct scripts record `<__main__>` while package imports emit `<pkg.mod>`, with CLI and environment overrides available for the legacy resolver.
1313

1414
## [0.2.0] - 2025-10-17
1515
### Added

codetracer-python-recorder/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,8 @@ action = "drop"
8484

8585
## Trace naming semantics
8686

87-
- Module-level activations no longer appear as the ambiguous `<module>` label. When the recorder sees `co_qualname == "<module>"`, it derives the actual dotted package name (e.g., `<my_pkg.mod>` or `<boto3.session>`) using project roots, `sys.modules`, and frame metadata.
87+
- Module-level activations no longer appear as the ambiguous `<module>` label. When the recorder sees `co_qualname == "<module>"`, it first reuses the frame's `__name__`, then falls back to trace-filter hints, `sys.path` roots, and package markers so scripts report `<__main__>` while real modules keep their dotted names (e.g., `<my_pkg.mod>` or `<boto3.session>`).
88+
- The globals-derived naming flow ships enabled by default; disable it temporarily with `--no-module-name-from-globals`, `codetracer.configure_policy(module_name_from_globals=False)`, or `CODETRACER_MODULE_NAME_FROM_GLOBALS=0` if you need to compare against the legacy resolver.
8889
- The angle-bracket convention remains for module entries so downstream tooling can distinguish top-level activations at a glance.
8990
- Traces will still emit `<module>` for synthetic filenames (`<stdin>`, `<string>`), frozen/importlib bootstrap frames, or exotic loaders that omit filenames entirely. This preserves previous behaviour when no reliable name exists.
9091

codetracer-python-recorder/benches/trace_filter.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ fn run_workload(engine: &TraceFilterEngine, dataset: &WorkloadDataset) {
5252
for &index in &dataset.event_indices {
5353
let code = dataset.codes[index].as_ref();
5454
let resolution = engine
55-
.resolve(py, code)
55+
.resolve(py, code, None)
5656
.expect("trace filter resolution should succeed during benchmarking");
5757
let policy = resolution.value_policy();
5858
for name in dataset.locals.iter() {
@@ -66,7 +66,7 @@ fn prewarm_engine(engine: &TraceFilterEngine, dataset: &WorkloadDataset) {
6666
Python::with_gil(|py| {
6767
for code in &dataset.codes {
6868
let _ = engine
69-
.resolve(py, code.as_ref())
69+
.resolve(py, code.as_ref(), None)
7070
.expect("prewarm resolution failed");
7171
}
7272
});

codetracer-python-recorder/codetracer_python_recorder/cli.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,15 @@ def _parse_args(argv: Sequence[str]) -> RecorderCLIConfig:
120120
"'proxies+fd' also mirrors raw file-descriptor writes."
121121
),
122122
)
123+
parser.add_argument(
124+
"--module-name-from-globals",
125+
action=argparse.BooleanOptionalAction,
126+
default=None,
127+
help=(
128+
"Derive module names from the Python frame's __name__ attribute (default: enabled). "
129+
"Use '--no-module-name-from-globals' to fall back to the legacy resolver."
130+
),
131+
)
123132

124133
known, remainder = parser.parse_known_args(argv)
125134
pending: list[str] = list(remainder)
@@ -181,6 +190,8 @@ def _parse_args(argv: Sequence[str]) -> RecorderCLIConfig:
181190
policy["io_capture_fd_fallback"] = True
182191
case other: # pragma: no cover - argparse choices block this
183192
parser.error(f"unsupported io-capture mode '{other}'")
193+
if known.module_name_from_globals is not None:
194+
policy["module_name_from_globals"] = known.module_name_from_globals
184195

185196
return RecorderCLIConfig(
186197
trace_dir=trace_dir,

codetracer-python-recorder/resources/trace_filters/builtin_default.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ exec = "skip"
3434
reason = "Skip builtins module instrumentation"
3535

3636
[[scope.rules]]
37-
selector = 'pkg:glob:*_distutils_hack*'
37+
selector = 'pkg:literal:_distutils_hack'
3838
exec = "skip"
3939
reason = "Skip setuptools shim module"
4040

0 commit comments

Comments
 (0)