diff --git a/codetracer-python-recorder/CHANGELOG.md b/codetracer-python-recorder/CHANGELOG.md index 15a62ef..000e61a 100644 --- a/codetracer-python-recorder/CHANGELOG.md +++ b/codetracer-python-recorder/CHANGELOG.md @@ -8,6 +8,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) ### Added - Balanced call-stack handling for generators, coroutines, and unwinding frames by subscribing to `PY_YIELD`, `PY_UNWIND`, `PY_RESUME`, and `PY_THROW`, mapping resume/throw events to `TraceWriter::register_call`, yield/unwind to `register_return`, and capturing `PY_THROW` arguments as `exception` using the existing value encoder. Added Python + Rust integration tests that drive `.send()`/`.throw()` on coroutines and generators to guarantee the trace stays balanced and that exception payloads are recorded. +### Changed +- Module-level call events now use the actual dotted module name (e.g., `` or ``) instead of the generic `` label. `RuntimeTracer` derives the name via the shared module-identity helper, caches the result per code object, and falls back to `` only for synthetic or nameless frames. Added Rust + Python tests plus README documentation covering the new semantics. + ## [0.2.0] - 2025-10-17 ### Added - Added configurable trace filters backed by layered TOML files with glob/regex/literal selectors for packages, files, objects, and value domains, strict schema validation via `TraceFilterConfig::from_paths`, and explicit `allow`/`redact`/`drop` value policies summarised with SHA-256 digests. diff --git a/codetracer-python-recorder/README.md b/codetracer-python-recorder/README.md index 683433b..c471fa5 100644 --- a/codetracer-python-recorder/README.md +++ b/codetracer-python-recorder/README.md @@ -82,6 +82,12 @@ selector = "arg:literal:debug_payload" action = "drop" ``` +## Trace naming semantics + +- Module-level activations no longer appear as the ambiguous `` label. When the recorder sees `co_qualname == ""`, it derives the actual dotted package name (e.g., `` or ``) using project roots, `sys.modules`, and frame metadata. +- The angle-bracket convention remains for module entries so downstream tooling can distinguish top-level activations at a glance. +- Traces will still emit `` for synthetic filenames (``, ``), frozen/importlib bootstrap frames, or exotic loaders that omit filenames entirely. This preserves previous behaviour when no reliable name exists. + ## Packaging expectations Desktop installers add the wheel to `PYTHONPATH` before invoking the user’s diff --git a/codetracer-python-recorder/src/lib.rs b/codetracer-python-recorder/src/lib.rs index 8ec5818..6a48c33 100644 --- a/codetracer-python-recorder/src/lib.rs +++ b/codetracer-python-recorder/src/lib.rs @@ -8,6 +8,7 @@ pub mod code_object; mod errors; mod ffi; mod logging; +mod module_identity; pub mod monitoring; mod policy; mod runtime; diff --git a/codetracer-python-recorder/src/module_identity.rs b/codetracer-python-recorder/src/module_identity.rs new file mode 100644 index 0000000..84c382a --- /dev/null +++ b/codetracer-python-recorder/src/module_identity.rs @@ -0,0 +1,489 @@ +//! Shared helpers for deriving Python module names from filenames and module metadata. + +use std::borrow::Cow; +use std::path::{Component, Path}; +use std::sync::Arc; + +use dashmap::DashMap; +use pyo3::prelude::*; +use pyo3::types::{PyAny, PyDict, PyList}; + +use crate::code_object::CodeObjectWrapper; + +/// Resolver that infers module names for absolute file paths using sys.path roots and +/// sys.modules fallbacks. Results are cached per path to avoid repeated scans. +#[derive(Debug, Clone)] +pub struct ModuleIdentityResolver { + module_roots: Arc<[String]>, + cache: DashMap>, +} + +impl ModuleIdentityResolver { + /// Construct a resolver using the current `sys.path`. + pub fn new() -> Self { + let roots = Python::with_gil(|py| collect_module_roots(py)); + Self::from_roots(roots) + } + + /// Construct a resolver from an explicit list of module roots. Visible for tests. + pub fn from_roots(roots: Vec) -> Self { + Self { + module_roots: Arc::from(roots), + cache: DashMap::new(), + } + } + + /// Resolve the module name for an absolute POSIX path (if any). + pub fn resolve_absolute(&self, py: Python<'_>, absolute: &str) -> Option { + if let Some(entry) = self.cache.get(absolute) { + return entry.clone(); + } + let resolved = module_name_from_roots(self.module_roots(), absolute) + .or_else(|| lookup_module_name(py, absolute)); + self.cache.insert(absolute.to_string(), resolved.clone()); + resolved + } + + /// Expose the sys.path roots used by this resolver (primarily for tests). + pub fn module_roots(&self) -> &[String] { + &self.module_roots + } +} + +/// Caches module names per code object id, reusing a shared resolver for filesystem lookups. +#[allow(dead_code)] +#[derive(Debug)] +pub struct ModuleIdentityCache { + resolver: ModuleIdentityResolver, + code_cache: DashMap>, +} + +#[allow(dead_code)] +impl ModuleIdentityCache { + /// Construct with a fresh resolver seeded from `sys.path`. + pub fn new() -> Self { + Self::with_resolver(ModuleIdentityResolver::new()) + } + + /// Construct using an explicit resolver (primarily for tests). + pub fn with_resolver(resolver: ModuleIdentityResolver) -> Self { + Self { + resolver, + code_cache: DashMap::new(), + } + } + + /// Resolve the dotted module name for a Python code object, caching the result by code id. + pub fn resolve_for_code<'py>( + &self, + py: Python<'py>, + code: &CodeObjectWrapper, + hints: ModuleNameHints<'_>, + ) -> Option { + if let Some(entry) = self.code_cache.get(&code.id()) { + return entry.clone(); + } + + let mut resolved = hints + .preferred + .and_then(sanitise_module_name) + .or_else(|| { + hints + .relative_path + .and_then(|relative| module_from_relative(relative)) + }) + .or_else(|| { + hints + .absolute_path + .and_then(|absolute| self.resolver.resolve_absolute(py, absolute)) + }) + .or_else(|| hints.globals_name.and_then(sanitise_module_name)); + + if resolved.is_none() && hints.absolute_path.is_none() { + if let Ok(filename) = code.filename(py) { + let path = Path::new(filename); + if path.is_absolute() { + if let Some(normalized) = normalise_to_posix(path) { + resolved = self.resolver.resolve_absolute(py, normalized.as_str()); + } + } + } + } + + self.code_cache.insert(code.id(), resolved.clone()); + resolved + } + + /// Remove a cached module name for a code id. + pub fn invalidate(&self, code_id: usize) { + self.code_cache.remove(&code_id); + } + + /// Clear all cached code-object mappings. + pub fn clear(&self) { + self.code_cache.clear(); + } + + /// Access the underlying resolver (primarily for tests/runtime wiring). + pub fn resolver(&self) -> &ModuleIdentityResolver { + &self.resolver + } +} + +/// Optional hints supplied when resolving module names. +#[allow(dead_code)] +#[derive(Debug, Default, Clone, Copy)] +pub struct ModuleNameHints<'a> { + /// Module name provided by another subsystem (e.g., trace filters). + pub preferred: Option<&'a str>, + /// Normalised project-relative path (used for deterministic names within a project). + pub relative_path: Option<&'a str>, + /// Absolute POSIX path to the source file. + pub absolute_path: Option<&'a str>, + /// `__name__` extracted from frame globals during runtime tracing. + pub globals_name: Option<&'a str>, +} + +#[allow(dead_code)] +impl<'a> ModuleNameHints<'a> { + pub fn new() -> Self { + Self::default() + } +} + +fn collect_module_roots(py: Python<'_>) -> Vec { + let mut roots = Vec::new(); + if let Ok(sys) = py.import("sys") { + if let Ok(path_obj) = sys.getattr("path") { + if let Ok(path_list) = path_obj.downcast_into::() { + for entry in path_list.iter() { + if let Ok(raw) = entry.extract::() { + if let Some(normalized) = normalise_to_posix(Path::new(&raw)) { + roots.push(normalized); + } + } + } + } + } + } + roots +} + +pub(crate) fn module_name_from_roots(roots: &[String], absolute: &str) -> Option { + for base in roots { + if let Some(relative) = strip_posix_prefix(absolute, base) { + if let Some(name) = relative_str_to_module(relative) { + return Some(name); + } + } + } + None +} + +fn lookup_module_name(py: Python<'_>, absolute: &str) -> Option { + let sys = py.import("sys").ok()?; + let modules_obj = sys.getattr("modules").ok()?; + let modules: Bound<'_, PyDict> = modules_obj.downcast_into::().ok()?; + + let mut best: Option<(usize, String)> = None; + 'modules: for (name_obj, module_obj) in modules.iter() { + let module_name: String = name_obj.extract().ok()?; + if module_obj.is_none() { + continue; + } + for candidate in module_candidate_paths(&module_obj) { + if equivalent_posix_paths(&candidate, absolute) { + let preferred = preferred_module_name(&module_name, &module_obj); + let score = module_name_score(&preferred); + let update = match best { + Some((best_score, _)) => score < best_score, + None => true, + }; + if update { + best = Some((score, preferred)); + if score == 0 { + break 'modules; + } + } + } + } + } + + best.map(|(_, name)| name) +} + +fn module_candidate_paths(module: &Bound<'_, PyAny>) -> Vec { + let mut candidates = Vec::new(); + if let Ok(spec) = module.getattr("__spec__") { + if let Some(origin) = extract_normalised_spec_origin(&spec) { + candidates.push(origin); + } + } + if let Some(file) = extract_normalised_attr(module, "__file__") { + candidates.push(file); + } + if let Some(cached) = extract_normalised_attr(module, "__cached__") { + candidates.push(cached); + } + candidates +} + +fn extract_normalised_attr(module: &Bound<'_, PyAny>, attr: &str) -> Option { + let value = module.getattr(attr).ok()?; + extract_normalised_path(&value) +} + +fn extract_normalised_spec_origin(spec: &Bound<'_, PyAny>) -> Option { + if spec.is_none() { + return None; + } + let origin = spec.getattr("origin").ok()?; + extract_normalised_path(&origin) +} + +fn extract_normalised_path(value: &Bound<'_, PyAny>) -> Option { + if value.is_none() { + return None; + } + let raw: String = value.extract().ok()?; + normalise_to_posix(Path::new(raw.as_str())) +} + +fn equivalent_posix_paths(candidate: &str, target: &str) -> bool { + if candidate == target { + return true; + } + if candidate.ends_with(".pyc") && target.ends_with(".py") { + return candidate.trim_end_matches('c') == target; + } + false +} + +fn preferred_module_name(default: &str, module: &Bound<'_, PyAny>) -> String { + if let Ok(spec) = module.getattr("__spec__") { + if let Ok(name) = spec.getattr("name") { + if let Ok(raw) = name.extract::() { + if !raw.is_empty() { + return raw; + } + } + } + } + if let Ok(name_attr) = module.getattr("__name__") { + if let Ok(raw) = name_attr.extract::() { + if !raw.is_empty() { + return raw; + } + } + } + default.to_string() +} + +fn module_name_score(name: &str) -> usize { + if name + .split('.') + .all(|segment| !segment.is_empty() && segment.chars().all(is_identifier_char)) + { + 0 + } else { + 1 + } +} + +fn is_identifier_char(ch: char) -> bool { + ch == '_' || ch.is_ascii_alphanumeric() +} + +/// Convert a normalised relative path (e.g., `pkg/foo.py`) into a dotted module name. +pub fn module_from_relative(relative: &str) -> Option { + relative_str_to_module(relative) +} + +#[allow(dead_code)] +fn sanitise_module_name(candidate: &str) -> Option { + let trimmed = candidate.trim(); + if trimmed.is_empty() { + return None; + } + if is_valid_module_name(trimmed) { + Some(trimmed.to_string()) + } else { + None + } +} + +/// Return true when the supplied module name is a dotted identifier. +pub fn is_valid_module_name(name: &str) -> bool { + !name.is_empty() + && name + .split('.') + .all(|segment| !segment.is_empty() && segment.chars().all(is_identifier_char)) +} + +fn relative_str_to_module(relative: &str) -> Option { + let mut parts: Vec<&str> = relative + .split('/') + .filter(|segment| !segment.is_empty()) + .collect(); + if parts.is_empty() { + return None; + } + let last = parts.pop().expect("non-empty"); + if let Some(stem) = last.strip_suffix(".py") { + if stem != "__init__" { + parts.push(stem); + } + } else { + parts.push(last); + } + if parts.is_empty() { + return None; + } + Some(parts.join(".")) +} + +fn strip_posix_prefix<'a>(path: &'a str, base: &str) -> Option<&'a str> { + if base.is_empty() { + return None; + } + if base == "/" { + return path.strip_prefix('/'); + } + if path.starts_with(base) { + let mut remainder = &path[base.len()..]; + if remainder.starts_with('/') { + remainder = &remainder[1..]; + } + if remainder.is_empty() { + None + } else { + Some(remainder) + } + } else { + None + } +} + +/// Normalise a filesystem path to a POSIX-style string used by trace filters. +pub fn normalise_to_posix(path: &Path) -> Option { + if path.as_os_str().is_empty() { + return None; + } + let mut parts = Vec::new(); + for component in path.components() { + match component { + Component::Normal(part) => parts.push(part.to_string_lossy()), + Component::Prefix(prefix) => parts.push(prefix.as_os_str().to_string_lossy()), + Component::RootDir => parts.push(Cow::Borrowed("")), + Component::CurDir => continue, + Component::ParentDir => { + parts.push(Cow::Borrowed("..")); + } + } + } + if parts.is_empty() { + None + } else { + Some(parts.join("/")) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::code_object::CodeObjectWrapper; + use pyo3::types::{PyAny, PyCode, PyModule}; + use std::ffi::CString; + + fn load_module<'py>( + py: Python<'py>, + module_name: &str, + file_path: &str, + source: &str, + ) -> PyResult> { + let code_c = CString::new(source).expect("source without NUL"); + let file_c = CString::new(file_path).expect("path without NUL"); + let module_c = CString::new(module_name).expect("module without NUL"); + PyModule::from_code( + py, + code_c.as_c_str(), + file_c.as_c_str(), + module_c.as_c_str(), + ) + } + + fn get_code<'py>(module: &Bound<'py, PyModule>, func_name: &str) -> Bound<'py, PyCode> { + let func: Bound<'py, PyAny> = module.getattr(func_name).expect("function"); + func.getattr("__code__") + .expect("__code__ attr") + .downcast_into::() + .expect("PyCode") + } + + #[test] + fn normalise_to_posix_handles_common_paths() { + let path = Path::new("src/lib/foo.py"); + assert_eq!(normalise_to_posix(path).as_deref(), Some("src/lib/foo.py")); + } + + #[test] + fn module_identity_cache_prefers_preferred_hint() { + Python::with_gil(|py| { + let module = + load_module(py, "tmp_mod", "tmp_mod.py", "def foo():\n return 1\n").unwrap(); + let code = get_code(&module, "foo"); + let wrapper = CodeObjectWrapper::new(py, &code); + let cache = ModuleIdentityCache::new(); + let hints = ModuleNameHints { + preferred: Some("pkg.actual"), + ..ModuleNameHints::default() + }; + let resolved = cache.resolve_for_code(py, &wrapper, hints); + assert_eq!(resolved.as_deref(), Some("pkg.actual")); + }); + } + + #[test] + fn module_identity_cache_uses_resolver_for_absolute_paths() { + Python::with_gil(|py| { + let tmp = tempfile::tempdir().expect("tempdir"); + let module_path = tmp.path().join("pkg").join("mod.py"); + std::fs::create_dir_all(module_path.parent().unwrap()).expect("mkdir"); + std::fs::write(&module_path, "def foo():\n return 1\n").expect("write source"); + + let module_path_str = module_path.to_string_lossy().to_string(); + let module = load_module( + py, + "pkg.mod", + module_path_str.as_str(), + "def foo():\n return 1\n", + ) + .unwrap(); + let code = get_code(&module, "foo"); + let wrapper = CodeObjectWrapper::new(py, &code); + let root = normalise_to_posix(tmp.path()).expect("normalize root"); + let resolver = ModuleIdentityResolver::from_roots(vec![root]); + let cache = ModuleIdentityCache::with_resolver(resolver); + let absolute_norm = normalise_to_posix(module_path.as_path()).expect("normalize abs"); + let hints = ModuleNameHints { + absolute_path: Some(absolute_norm.as_str()), + ..ModuleNameHints::default() + }; + + let resolved = cache.resolve_for_code(py, &wrapper, hints); + assert_eq!(resolved.as_deref(), Some("pkg.mod")); + }); + } + + #[test] + fn module_from_relative_strips_init() { + assert_eq!( + module_from_relative("pkg/module/__init__.py").as_deref(), + Some("pkg.module") + ); + assert_eq!( + module_from_relative("pkg/module/sub.py").as_deref(), + Some("pkg.module.sub") + ); + } +} diff --git a/codetracer-python-recorder/src/runtime/activation.rs b/codetracer-python-recorder/src/runtime/activation.rs index 6d9d51f..06e5924 100644 --- a/codetracer-python-recorder/src/runtime/activation.rs +++ b/codetracer-python-recorder/src/runtime/activation.rs @@ -103,10 +103,7 @@ impl ActivationController { } fn resume_if_needed(&mut self, code: &CodeObjectWrapper) { - if self.started - && self.suspended - && self.activation_code_id == Some(code.id()) - { + if self.started && self.suspended && self.activation_code_id == Some(code.id()) { self.suspended = false; } } diff --git a/codetracer-python-recorder/src/runtime/tracer/events.rs b/codetracer-python-recorder/src/runtime/tracer/events.rs index 8fbd910..c96524e 100644 --- a/codetracer-python-recorder/src/runtime/tracer/events.rs +++ b/codetracer-python-recorder/src/runtime/tracer/events.rs @@ -226,8 +226,11 @@ impl Tracer for RuntimeTracer { log::error!("on_py_start: failed to capture args: {details}"); }); return Err(ffi::map_recorder_error( - enverr!(ErrorCode::FrameIntrospectionFailed, "failed to capture call arguments") - .with_context("details", details), + enverr!( + ErrorCode::FrameIntrospectionFailed, + "failed to capture call arguments" + ) + .with_context("details", details), )); } } @@ -505,6 +508,7 @@ impl Tracer for RuntimeTracer { .map_err(ffi::map_recorder_error)?; } self.function_ids.clear(); + self.module_names.clear(); self.io.clear_snapshots(); self.filter.reset(); self.lifecycle.reset_event_state(); @@ -518,6 +522,7 @@ impl Tracer for RuntimeTracer { .finalise(&mut self.writer, &self.filter) .map_err(ffi::map_recorder_error)?; self.function_ids.clear(); + self.module_names.clear(); self.filter.reset(); self.io.clear_snapshots(); self.lifecycle.reset_event_state(); @@ -582,9 +587,7 @@ impl RuntimeTracer { }; let telemetry = telemetry_holder.as_deref_mut(); - let candidate_name = capture_label - .map(|label| label as &str) - .or(object_name); + let candidate_name = capture_label.map(|label| label as &str).or(object_name); record_return_value( py, @@ -597,11 +600,7 @@ impl RuntimeTracer { self.mark_event(); if let Some(kind) = exit_kind { - if self - .lifecycle - .activation_mut() - .handle_exit(code.id(), kind) - { + if self.lifecycle.activation_mut().handle_exit(code.id(), kind) { let _mute = ScopedMuteIoCapture::new(); log::debug!("[RuntimeTracer] deactivated on activation return"); } diff --git a/codetracer-python-recorder/src/runtime/tracer/io.rs b/codetracer-python-recorder/src/runtime/tracer/io.rs index 7e28ce2..2e93954 100644 --- a/codetracer-python-recorder/src/runtime/tracer/io.rs +++ b/codetracer-python-recorder/src/runtime/tracer/io.rs @@ -35,11 +35,7 @@ impl IoCoordinator { } /// Install the IO capture pipeline using the provided settings. - pub(crate) fn install( - &mut self, - py: Python<'_>, - settings: IoCaptureSettings, - ) -> PyResult<()> { + pub(crate) fn install(&mut self, py: Python<'_>, settings: IoCaptureSettings) -> PyResult<()> { self.pipeline = IoCapturePipeline::install(py, Arc::clone(&self.snapshots), settings)?; Ok(()) } diff --git a/codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs b/codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs index a21f611..d8c0f53 100644 --- a/codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs +++ b/codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs @@ -4,6 +4,7 @@ use super::io::IoCoordinator; use super::lifecycle::LifecycleController; use crate::code_object::CodeObjectWrapper; use crate::ffi; +use crate::module_identity::{ModuleIdentityCache, ModuleNameHints}; use crate::policy::RecorderPolicy; use crate::runtime::io_capture::{IoCaptureSettings, ScopedMuteIoCapture}; use crate::runtime::line_snapshots::LineSnapshotStore; @@ -12,7 +13,7 @@ use crate::trace_filter::engine::TraceFilterEngine; use pyo3::prelude::*; use runtime_tracing::NonStreamingTraceWriter; use runtime_tracing::{Line, TraceEventsFileFormat, TraceWriter}; -use std::collections::{hash_map::Entry, HashMap}; +use std::collections::HashMap; use std::path::Path; use std::sync::Arc; use std::thread::ThreadId; @@ -26,6 +27,7 @@ pub struct RuntimeTracer { pub(super) function_ids: HashMap, pub(super) io: IoCoordinator, pub(super) filter: FilterCoordinator, + pub(super) module_names: ModuleIdentityCache, } impl RuntimeTracer { @@ -46,6 +48,7 @@ impl RuntimeTracer { function_ids: HashMap::new(), io: IoCoordinator::new(), filter: FilterCoordinator::new(trace_filter), + module_names: ModuleIdentityCache::new(), } } @@ -101,21 +104,20 @@ impl RuntimeTracer { py: Python<'_>, code: &CodeObjectWrapper, ) -> PyResult { - match self.function_ids.entry(code.id()) { - Entry::Occupied(entry) => Ok(*entry.get()), - Entry::Vacant(slot) => { - let name = code.qualname(py)?; - let filename = code.filename(py)?; - let first_line = code.first_line(py)?; - let function_id = TraceWriter::ensure_function_id( - &mut self.writer, - name, - Path::new(filename), - Line(first_line as i64), - ); - Ok(*slot.insert(function_id)) - } + if let Some(fid) = self.function_ids.get(&code.id()) { + return Ok(*fid); } + let name = self.function_name(py, code)?; + let filename = code.filename(py)?; + let first_line = code.first_line(py)?; + let function_id = TraceWriter::ensure_function_id( + &mut self.writer, + name.as_str(), + Path::new(filename), + Line(first_line as i64), + ); + self.function_ids.insert(code.id(), function_id); + Ok(function_id) } pub(super) fn should_trace_code( @@ -125,6 +127,41 @@ impl RuntimeTracer { ) -> TraceDecision { self.filter.decide(py, code) } + + fn function_name(&self, py: Python<'_>, code: &CodeObjectWrapper) -> PyResult { + let qualname = code.qualname(py)?; + if qualname == "" { + Ok(self + .derive_module_name(py, code) + .map(|module| format!("<{module}>")) + .unwrap_or_else(|| qualname.to_string())) + } else { + Ok(qualname.to_string()) + } + } + + fn derive_module_name(&self, py: Python<'_>, code: &CodeObjectWrapper) -> Option { + let resolution = self.filter.cached_resolution(code.id()); + if let Some(resolution) = resolution.as_ref() { + let hints = ModuleNameHints { + preferred: resolution.module_name(), + relative_path: resolution.relative_path(), + absolute_path: resolution.absolute_path(), + globals_name: None, + }; + self.module_names.resolve_for_code(py, code, hints) + } else { + self.module_names + .resolve_for_code(py, code, ModuleNameHints::default()) + } + } +} + +#[cfg(test)] +impl RuntimeTracer { + fn function_name_for_test(&self, py: Python<'_>, code: &CodeObjectWrapper) -> PyResult { + self.function_name(py, code) + } } #[cfg(test)] @@ -1039,6 +1076,48 @@ sensitive("s3cr3t") }); } + #[test] + fn module_import_records_module_name() { + Python::with_gil(|py| { + let project = tempfile::tempdir().expect("project dir"); + let pkg_root = project.path().join("lib"); + let pkg_dir = pkg_root.join("my_pkg"); + fs::create_dir_all(&pkg_dir).expect("create package dir"); + let module_path = pkg_dir.join("mod.py"); + fs::write(&module_path, "value = 1\n").expect("write module file"); + + let sys = py.import("sys").expect("import sys"); + let sys_path = sys.getattr("path").expect("sys.path"); + sys_path + .call_method1("insert", (0, pkg_root.to_string_lossy().as_ref())) + .expect("insert temp root"); + + let tracer = + RuntimeTracer::new("runner.py", &[], TraceEventsFileFormat::Json, None, None); + + let builtins = py.import("builtins").expect("builtins"); + let compile = builtins.getattr("compile").expect("compile builtin"); + let code_obj: Bound<'_, PyCode> = compile + .call1(( + "value = 1\n", + module_path.to_string_lossy().as_ref(), + "exec", + )) + .expect("compile module code") + .downcast_into() + .expect("PyCode"); + + let wrapper = CodeObjectWrapper::new(py, &code_obj); + let resolved = tracer + .function_name_for_test(py, &wrapper) + .expect("derive function name"); + + assert_eq!(resolved, ""); + + sys_path.call_method1("pop", (0,)).expect("pop temp root"); + }); + } + #[test] fn user_drop_default_overrides_builtin_allowance() { Python::with_gil(|py| { diff --git a/codetracer-python-recorder/src/trace_filter/engine.rs b/codetracer-python-recorder/src/trace_filter/engine.rs index a31a610..2230bd6 100644 --- a/codetracer-python-recorder/src/trace_filter/engine.rs +++ b/codetracer-python-recorder/src/trace_filter/engine.rs @@ -4,6 +4,7 @@ //! and caches per-code-object resolutions so the hot tracing callbacks only pay a fast lookup. use crate::code_object::CodeObjectWrapper; +use crate::module_identity::{is_valid_module_name, normalise_to_posix, ModuleIdentityResolver}; use crate::trace_filter::config::{ ExecDirective, FilterSource, FilterSummary, ScopeRule, TraceFilterConfig, ValueAction, ValuePattern, @@ -11,9 +12,7 @@ use crate::trace_filter::config::{ use crate::trace_filter::selector::{Selector, SelectorKind}; use dashmap::DashMap; use pyo3::{prelude::*, PyErr}; -use pyo3::types::{PyDict, PyList}; use recorder_errors::{target, ErrorCode, RecorderResult}; -use std::borrow::Cow; use std::path::{Component, Path, PathBuf}; use std::sync::Arc; @@ -189,8 +188,7 @@ pub struct TraceFilterEngine { default_value_source: usize, rules: Arc<[CompiledScopeRule]>, cache: DashMap>, - module_cache: DashMap>, - module_roots: Arc<[String]>, + module_resolver: ModuleIdentityResolver, } impl TraceFilterEngine { @@ -200,7 +198,7 @@ impl TraceFilterEngine { let default_value_action = config.default_value_action(); let default_value_source = config.default_value_source(); let rules = compile_rules(config.rules()); - let module_roots = Python::with_gil(|py| collect_module_roots(py)); + let module_resolver = ModuleIdentityResolver::new(); TraceFilterEngine { config: Arc::new(config), @@ -209,8 +207,7 @@ impl TraceFilterEngine { default_value_source, rules, cache: DashMap::new(), - module_cache: DashMap::new(), - module_roots: module_roots.into(), + module_resolver, } } @@ -327,14 +324,7 @@ impl TraceFilterEngine { } fn resolve_module_name(&self, py: Python<'_>, absolute: &str) -> Option { - if let Some(entry) = self.module_cache.get(absolute) { - return entry.value().clone(); - } - let resolved = module_name_from_roots(&self.module_roots, absolute) - .or_else(|| lookup_module_name(py, absolute)); - self.module_cache - .insert(absolute.to_string(), resolved.clone()); - resolved + self.module_resolver.resolve_absolute(py, absolute) } } @@ -457,7 +447,7 @@ impl ScopeContext { let module_name = relative_path .as_deref() - .and_then(|rel| module_from_relative(rel).map(|cow| cow.into_owned())); + .and_then(|rel| crate::module_identity::module_from_relative(rel)); ScopeContext { module_name, @@ -483,230 +473,6 @@ impl ScopeContext { } } -fn lookup_module_name(py: Python<'_>, absolute: &str) -> Option { - let sys = py.import("sys").ok()?; - let modules_obj = match sys.getattr("modules") { - Ok(value) => value, - Err(_) => return None, - }; - let modules: Bound<'_, PyDict> = match modules_obj.downcast_into::() { - Ok(dict) => dict, - Err(_) => return None, - }; - let mut best: Option<(usize, String)> = None; - 'modules: for (name_obj, module_obj) in modules.iter() { - let module_name: String = match name_obj.extract() { - Ok(value) => value, - Err(_) => continue, - }; - if module_obj.is_none() { - continue; - } - for candidate in module_candidate_paths(&module_obj) { - if equivalent_posix_paths(&candidate, absolute) { - let preferred = preferred_module_name(&module_name, &module_obj); - let score = module_name_score(&preferred); - let update = match best { - Some((best_score, _)) => score < best_score, - None => true, - }; - if update { - best = Some((score, preferred)); - if score == 0 { - break 'modules; - } - } - } - } - } - best.map(|(_, name)| name) -} - -fn collect_module_roots(py: Python<'_>) -> Vec { - let mut roots = Vec::new(); - if let Ok(sys) = py.import("sys") { - if let Ok(path_obj) = sys.getattr("path") { - if let Ok(path_list) = path_obj.downcast_into::() { - for entry in path_list.iter() { - if let Ok(raw) = entry.extract::() { - if let Some(normalized) = normalise_to_posix(Path::new(&raw)) { - roots.push(normalized); - } - } - } - } - } - } - roots -} - -fn module_name_from_roots(roots: &[String], absolute: &str) -> Option { - for base in roots { - if let Some(relative) = strip_posix_prefix(absolute, base) { - if let Some(name) = relative_str_to_module(relative) { - return Some(name); - } - } - } - None -} - -fn module_candidate_paths(module: &Bound<'_, PyAny>) -> Vec { - let mut candidates = Vec::new(); - if let Ok(spec) = module.getattr("__spec__") { - if let Some(origin) = extract_normalised_spec_origin(&spec) { - candidates.push(origin); - } - } - if let Some(file) = extract_normalised_attr(module, "__file__") { - candidates.push(file); - } - if let Some(cached) = extract_normalised_attr(module, "__cached__") { - candidates.push(cached); - } - candidates -} - -fn extract_normalised_attr(module: &Bound<'_, PyAny>, attr: &str) -> Option { - let value = module.getattr(attr).ok()?; - extract_normalised_path(&value) -} - -fn extract_normalised_spec_origin(spec: &Bound<'_, PyAny>) -> Option { - if spec.is_none() { - return None; - } - let origin = spec.getattr("origin").ok()?; - extract_normalised_path(&origin) -} - -fn extract_normalised_path(value: &Bound<'_, PyAny>) -> Option { - if value.is_none() { - return None; - } - let raw: String = value.extract().ok()?; - normalise_to_posix(Path::new(raw.as_str())) -} - -fn equivalent_posix_paths(candidate: &str, target: &str) -> bool { - if candidate == target { - return true; - } - if candidate.ends_with(".pyc") && target.ends_with(".py") { - return candidate.trim_end_matches('c') == target; - } - false -} - -fn preferred_module_name(default: &str, module: &Bound<'_, PyAny>) -> String { - if let Ok(spec) = module.getattr("__spec__") { - if let Ok(name) = spec.getattr("name") { - if let Ok(raw) = name.extract::() { - if !raw.is_empty() { - return raw; - } - } - } - } - if let Ok(name_attr) = module.getattr("__name__") { - if let Ok(raw) = name_attr.extract::() { - if !raw.is_empty() { - return raw; - } - } - } - default.to_string() -} - -fn module_name_score(name: &str) -> usize { - if name - .split('.') - .all(|segment| !segment.is_empty() && segment.chars().all(is_identifier_char)) - { - 0 - } else { - 1 - } -} - -fn is_identifier_char(ch: char) -> bool { - ch == '_' || ch.is_ascii_alphanumeric() -} - -fn is_valid_module_name(name: &str) -> bool { - !name.is_empty() - && name - .split('.') - .all(|segment| !segment.is_empty() && segment.chars().all(is_identifier_char)) -} - -fn strip_posix_prefix<'a>(path: &'a str, base: &str) -> Option<&'a str> { - if base.is_empty() { - return None; - } - if base == "/" { - return path.strip_prefix('/'); - } - if path.starts_with(base) { - let mut remainder = &path[base.len()..]; - if remainder.starts_with('/') { - remainder = &remainder[1..]; - } - if remainder.is_empty() { - None - } else { - Some(remainder) - } - } else { - None - } -} - -fn relative_str_to_module(relative: &str) -> Option { - let mut parts: Vec<&str> = relative - .split('/') - .filter(|segment| !segment.is_empty()) - .collect(); - if parts.is_empty() { - return None; - } - let last = parts.pop().expect("non-empty"); - if let Some(stem) = last.strip_suffix(".py") { - if stem != "__init__" { - parts.push(stem); - } - } else { - parts.push(last); - } - if parts.is_empty() { - return None; - } - Some(parts.join(".")) -} - -fn normalise_to_posix(path: &Path) -> Option { - if path.as_os_str().is_empty() { - return None; - } - let mut parts = Vec::new(); - for component in path.components() { - match component { - Component::Normal(part) => parts.push(part.to_string_lossy()), - Component::Prefix(prefix) => parts.push(prefix.as_os_str().to_string_lossy()), - Component::RootDir => parts.push(Cow::Borrowed("")), - Component::CurDir => continue, - Component::ParentDir => { - parts.push(Cow::Borrowed("..")); - } - } - } - if parts.is_empty() { - None - } else { - Some(parts.join("/")) - } -} - fn normalise_relative(relative: PathBuf) -> String { let mut components = Vec::new(); for component in relative.components() { @@ -724,27 +490,6 @@ fn normalise_relative(relative: PathBuf) -> String { components.join("/") } -fn module_from_relative(relative: &str) -> Option> { - if relative.is_empty() { - return None; - } - let trimmed = relative.trim_start_matches("./"); - let without_suffix = trimmed.strip_suffix(".py").unwrap_or(trimmed); - if without_suffix.is_empty() { - return None; - } - let mut parts: Vec<&str> = without_suffix.split('/').collect(); - if let Some(last) = parts.last().copied() { - if last == "__init__" { - parts.pop(); - } - } - if parts.is_empty() { - return None; - } - Some(Cow::Owned(parts.join("."))) -} - fn py_attr_error(attr: &str, err: PyErr) -> recorder_errors::RecorderError { target!( ErrorCode::FrameIntrospectionFailed, @@ -757,8 +502,9 @@ fn py_attr_error(attr: &str, err: PyErr) -> recorder_errors::RecorderError { #[cfg(test)] mod tests { use super::*; + use crate::module_identity::module_name_from_roots; use crate::trace_filter::config::TraceFilterConfig; - use pyo3::types::{PyAny, PyCode, PyDict, PyModule}; + use pyo3::types::{PyAny, PyCode, PyDict, PyList, PyModule}; use std::ffi::CString; use std::fs; use std::io::Write; @@ -901,9 +647,8 @@ mod tests { fs::write(app_dir.join("__init__.py"), "\n").expect("write __init__"); let sys = py.import("sys").expect("import sys"); let sys_path_any = sys.getattr("path").expect("sys.path"); - let sys_path: Bound<'_, PyList> = sys_path_any - .downcast_into::() - .expect("path list"); + let sys_path: Bound<'_, PyList> = + sys_path_any.downcast_into::().expect("path list"); sys_path .insert(0, project_root.to_string_lossy().to_string()) .expect("insert project root"); @@ -912,9 +657,8 @@ mod tests { let module = py.import("app.foo").expect("import app.foo"); let modules_any = sys.getattr("modules").expect("sys.modules"); - let modules: Bound<'_, PyDict> = modules_any - .downcast_into::() - .expect("modules dict"); + let modules: Bound<'_, PyDict> = + modules_any.downcast_into::().expect("modules dict"); modules .set_item("app.foo", module.as_any()) .expect("register module"); @@ -932,15 +676,12 @@ mod tests { let engine = TraceFilterEngine::new(config.clone()); let expected_root = normalise_to_posix(Path::new(project_root.to_string_lossy().as_ref())).unwrap(); - assert!(engine.module_roots.iter().any(|root| root == &expected_root)); - let derived_from_roots = - module_name_from_roots(&engine.module_roots, absolute_path.as_str()); + let resolver_roots = engine.module_resolver.module_roots(); + assert!(resolver_roots.iter().any(|root| root == &expected_root)); + let derived_from_roots = module_name_from_roots(resolver_roots, absolute_path.as_str()); assert_eq!(derived_from_roots, Some("app.foo".to_string())); let resolution = engine.resolve(py, &wrapper)?; - assert_eq!( - resolution.absolute_path(), - Some(absolute_path.as_str()) - ); + assert_eq!(resolution.absolute_path(), Some(absolute_path.as_str())); assert_eq!(resolution.module_name(), Some("app.foo")); assert_eq!(resolution.exec(), ExecDecision::Skip); diff --git a/codetracer-python-recorder/tests/python/test_monitoring_events.py b/codetracer-python-recorder/tests/python/test_monitoring_events.py index c0927ac..da11e0d 100644 --- a/codetracer-python-recorder/tests/python/test_monitoring_events.py +++ b/codetracer-python-recorder/tests/python/test_monitoring_events.py @@ -196,6 +196,32 @@ def arg_value(i: int) -> Dict[str, Any]: assert v1.get("text") == "x" +def test_module_imports_record_package_names(tmp_path: Path) -> None: + pkg_root = tmp_path / "lib" + pkg_dir = pkg_root / "my_pkg" + pkg_dir.mkdir(parents=True) + (pkg_dir / "__init__.py").write_text("", encoding="utf-8") + (pkg_dir / "mod.py").write_text("VALUE = 1\n", encoding="utf-8") + + runner = tmp_path / "runner.py" + runner.write_text( + f"import sys\nsys.path.insert(0, r\"{pkg_root}\")\nimport my_pkg.mod\n", + encoding="utf-8", + ) + + out_dir = ensure_trace_dir(tmp_path) + session = codetracer.start(out_dir, format=codetracer.TRACE_JSON, start_on_enter=runner) + try: + runpy.run_path(str(runner), run_name="__main__") + finally: + codetracer.flush() + codetracer.stop() + + parsed = _parse_trace(out_dir) + names = [f["name"] for f in parsed.functions] + assert any(name == "" for name in names), f"missing module name in {names}" + + def test_all_argument_kinds_recorded_on_py_start(tmp_path: Path) -> None: # Arrange: write a script with a function using all Python argument kinds code = ( diff --git a/design-docs/adr/0014-module-call-event-naming.md b/design-docs/adr/0014-module-call-event-naming.md new file mode 100644 index 0000000..ed86a3b --- /dev/null +++ b/design-docs/adr/0014-module-call-event-naming.md @@ -0,0 +1,55 @@ +# ADR 0014: Module Call Events Report Actual Module Names + +- **Status:** Proposed +- **Date:** 2025-10-26 +- **Deciders:** codetracer recorder maintainers +- **Consulted:** DX/observability stakeholders, Python runtime SMEs +- **Informed:** Support engineers, product analytics consumers + +## Context +- Every Python import executes the target module object under the hood. The recorder hooks `PY_START` for those executions and emits a synthetic “function call” whose name currently comes from `code.co_qualname`. +- For module-level code objects CPython hardcodes `co_name == co_qualname == ""`, so our trace file shows dozens (or hundreds) of `` activations with no indication of which package was imported. +- Trace consumers (CLI visualisations, query tools, and data-engineering pipelines) rely on the recorded function name to surface hot paths, build flame graphs, and attribute time/cost to particular packages. Without module names, users cannot tell whether a slow import belongs to `boto3`, `_distutils_hack`, or their own modules. +- The trace filter engine already resolves module names from filenames / `sys.modules` to power `pkg:*` selectors, but the runtime tracer never reuses that information when assigning `FunctionId`s. + +## Problem +- Module import events are indistinguishable in the trace log, making it impossible to attribute import costs, filter specific packages after the fact, or answer “which module executed here?” without cross-referencing filenames manually. +- Because traces only show ``, downstream tools collapse all module-level activations into a single node, hiding per-package behaviour and producing misleading metrics. +- Purely using filenames as a proxy would leak physical layouts (e.g., `/usr/lib/python3.12/site-packages/boto3/__init__.py`) into the user-facing name and fails for zip apps or namespace packages. + +## Decision +1. **Introduce a shared module identity helper.** + - Extract the existing module-derivation logic (`module_name_from_roots`, `lookup_module_name`) into a reusable service that can run independent of the filter engine. + - Accept a `CodeObjectWrapper` + `Python<'_>` handle and return a cached module name keyed by (code id, canonical filename). The helper first consults the filter resolution (if available), then repeats the relative-path inference, and finally falls back to `sys.modules` + frame globals (`__name__`) before conceding. +2. **Rewrite `` names as `<{resolved_name}>`.** + - When `RuntimeTracer::ensure_function_id` observes `co_qualname == ""`, it queries the helper for the semantic module name. + - If a valid dotted identifier comes back (e.g., `boto3`, `foo.bar`), the tracer registers the function as `` / `` so downstream tooling still recognises the syntactic angle-bracket convention while learning the package involved. + - If resolution fails or yields garbage (non-identifier characters), the tracer keeps the legacy `` label to avoid fabricating bad data. +3. **Cache aggressively and keep the hot path cheap.** + - Store successful and failed lookups in `HashMap>` keyed by code id so we only touch `sys.modules` or frame globals once per module. + - Reuse the helper from both the runtime tracer and (later) the pure-Python recorder to ensure consistent naming semantics across products. +4. **Document the behaviour.** + - Update the recorder docs to state that module-level call events now show `` and call out the limited cases (synthetic filenames, frozen modules, namespace packages) where the fallback remains ``. + +## Consequences +- **Pros** + - Import-heavy traces become readable: users immediately know which modules executed without digging through file paths. + - Post-processing / analytics pipelines gain a stable key (the dotted module name) to aggregate import costs or identify slow third-party packages. + - Reusing the existing derivation logic keeps behaviour aligned with trace filters and avoids duplicating heuristics. +- **Cons** + - The tracer must occasionally scan `sys.modules` or inspect frame globals, which would add a small amount of work the first time we see each module. Caching mitigates the steady-state cost. + - Changing the function label alters the textual output of existing traces, so snapshot tests or downstream expectations that explicitly compare against `` need updates. +- **Risks** + - Mis-resolving a module (e.g., due to namespace packages exposing multiple files) could misattribute work to the wrong name. We guard this by preferring explicit `__spec__.name` / `__name__` over path guesses and falling back to `` when ambiguous. + - Frames without real filenames (``, ``) still cannot produce a meaningful module name. We explicitly document this to prevent support churn. + - Accessing `sys.modules` must hold the GIL and avoid long-lived Python references; the helper API enforces that discipline. + +## Alternatives +- **Keep `` and rely on filenames elsewhere.** Rejected because the filename is not present on every trace consumer surface and is cumbersome for humans. +- **Rewrite module names from filenames only.** Rejected due to incorrect results for namespace packages, zip imports, site-packages bytecode caches, and `.pyc` vs `.py` mismatches. +- **Add a new event field for module name.** Rejected to avoid changing the trace file schema; reusing the existing `function_name` field keeps compat with all tooling. + +## References +- `codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs` (`ensure_function_id`). +- `codetracer-python-recorder/src/runtime/tracer/events.rs` (`register_call_record`). +- `codetracer-python-recorder/src/trace_filter/engine.rs` (current module name derivation & caching). diff --git a/design-docs/module-call-event-naming-implementation-plan.md b/design-docs/module-call-event-naming-implementation-plan.md new file mode 100644 index 0000000..6acd934 --- /dev/null +++ b/design-docs/module-call-event-naming-implementation-plan.md @@ -0,0 +1,70 @@ +# Module Call Event Naming – Implementation Plan + +Plan owners: codetracer recorder maintainers +Target ADR: 0014 – Module Call Events Report Actual Module Names +Impacted components: `codetracer-python-recorder/src/runtime/tracer`, `src/runtime/value_capture`, `src/trace_filter`, design docs & user docs + +## Goals +- Replace the `` placeholder emitted during module imports with the actual dotted module name (e.g., ``, ``). +- Share a single, cached module identity resolver between trace filtering and runtime event recording so both systems agree on module names. +- Keep the hot path fast by caching module resolutions per code object and falling back to `` only when we genuinely cannot determine the name. +- Provide regression tests proving the trace writer records the new labels for in-project modules, site-packages imports, and namespace packages with real filenames. + +## Non-Goals +- Changing the trace file schema or emitting additional fields beyond the function name. +- Reworking how module names feed into configurable trace filters (that behaviour is covered by ADR 0013). +- Retrofitting the pure-Python recorder in this iteration, though the helper should make that future work straightforward. + +## Current Gaps +- `RuntimeTracer::ensure_function_id` always uses `code.co_qualname`, which is `` for every top-level module execution. +- Module name derivation logic lives inside `trace_filter::engine` and cannot be reused without duplicating heuristics (relative path stripping, `sys.modules` scan, `.pyc` awareness). +- There is no cache tying `code_id` → module name outside the filter cache, so even if we bolted on ad-hoc lookups we would keep rescanning `sys.modules`. +- No tests assert the textual function name for import events, so regressions would go unnoticed. + +## Workstreams + +### WS1 – Shared Module Identity Helper +**Scope:** Factor reusable module name derivation ahead of tracer integration. +- Extract `module_name_from_roots`, `lookup_module_name`, normalization helpers, and identifier validation from `trace_filter::engine` into a new module (e.g., `module_identity.rs`) under `src/common` or similar. +- Add APIs: + - `ModuleIdentityResolver::from_sys_path(py) -> Self` to snapshot `sys.path` roots. + - `fn resolve_for_code(&self, py, code, frame_globals_name: Option<&str>) -> Option` that performs (1) cached lookup by code id, (2) relative path inference, (3) `sys.modules` scan, and (4) optional `__name__` override when provided. +- Allow the resolver to accept already-known module names (from `ScopeResolution.module_name`) so we do not recompute them. +- Write focused Rust unit tests that stub out `sys.modules` entries to cover `.py` vs `.pyc` matches, namespace packages (`package/__init__.py`), and failure cases. +- Update `trace_filter::engine` to depend on the new helper rather than its private copy, keeping behaviour identical. + +### WS2 – Runtime Tracer Integration +**Scope:** Use the helper to rename `` call events. +- Extend `RuntimeTracer` with a `module_name_cache: HashMap>` (or wrap the helper inside the tracer) and ensure it is cleared when the tracer resets. +- When registering a call: + - Check whether the filter resolution already cached a module name; pass it into the resolver to avoid recomputation. + - Detect module-level code by checking `qualname == ""` (and guard against variants like `""` with whitespace). + - Ask the helper for a module name; when successful, format the function label as `format!("<{name}>")` before calling `TraceWriter::ensure_function_id`. + - If the helper cannot decide, continue emitting `` to preserve backwards compatibility. +- Consider inspecting the captured frame snapshot (already obtained for argument capture) to pull `globals["__name__"]` as an extra hint when `sys.modules` fails; plumb that optional string into the helper to keep the `ensure_function_id` signature narrow. +- Emit debug logs when we fall back to `` despite having a real filename so troubleshooting remains possible. + +### WS3 – Testing, Tooling, and Docs +**Scope:** Prove the behaviour change and explain it to users. +- Add Python integration tests (e.g., `tests/python/test_monitoring_events.py`) that: + - Import a first-party module (`tests/python/support/sample_pkg/__init__.py`) and assert the trace JSON contains ``. + - Import a third-party-like package placed under a temporary directory inserted into `sys.path` to ensure `sys.modules` + relative path fallback works. + - Cover namespace packages or packages that only expose `__spec__.name` to ensure we trust metadata over filesystem guesses. +- Add Rust tests for the resolver cache (verify we only touch `sys.modules` once per module) and for the formatting logic in `ensure_function_id`. +- Update `docs/README.md` (or recorder-specific docs) to mention that module names now appear in angle brackets, along with any troubleshooting guidance for cases where `` persists (synthetic filenames, frozen imports). +- Refresh changelog entries and any CLI snapshot tests that checked for ``. + +## Testing Strategy +- `cargo test -p codetracer-python-recorder` to run the new resolver unit tests. +- `just test` to exercise Python integration tests and ensure traces serialize as expected. +- Manual verification script that traces `python - <<'PY' ...` importing `boto3` (or a stub) and inspects the `.trace` file with `runtime_tracing::TraceReader`. + +## Risks & Mitigations +- **Performance regressions:** Taking the GIL and scanning `sys.modules` per module could add overhead. Mitigate via per-code caches and by reusing filter-derived names when available. +- **Incorrect attribution:** Namespace packages or reloaded modules might map multiple files to one name. Mitigate by preferring `__spec__.name`/`__name__` and logging whenever multiple candidates compete. +- **Tight coupling with filters:** If the helper accidentally references filter-specific types, it becomes impossible to reuse elsewhere. Keep the helper independent and inject filter results as plain `Option`. +- **Test brittleness:** Snapshot tests referencing `` need refreshing; add helper assertions that look for `<...>` pattern rather than hard-coded ``. + +## Open Questions +- Should we expose the resolved module name anywhere else (e.g., as metadata on trace records) to aid scripting? For now we only change the function label, but we might want to surface the raw dotted name later. +- How should we treat modules executed via `runpy.run_module` with `__name__ == "__main__"` but living under a package path? The helper will return `"__main__"`; confirm that is acceptable or consider using the package dotted path derived from filename when `__name__ == "__main__"`. diff --git a/design-docs/module-call-event-naming-implementation-plan.status.md b/design-docs/module-call-event-naming-implementation-plan.status.md new file mode 100644 index 0000000..264f6b1 --- /dev/null +++ b/design-docs/module-call-event-naming-implementation-plan.status.md @@ -0,0 +1,37 @@ +# Module Call Event Naming – Status + +## Relevant Design Docs +- `design-docs/adr/0014-module-call-event-naming.md` +- `design-docs/module-call-event-naming-implementation-plan.md` +- `design-docs/adr/0013-reliable-module-name-derivation.md` + +## Key Source Files +- `codetracer-python-recorder/src/runtime/tracer/runtime_tracer.rs` +- `codetracer-python-recorder/src/runtime/tracer/events.rs` +- `codetracer-python-recorder/src/runtime/value_capture.rs` +- `codetracer-python-recorder/src/runtime/frame_inspector.rs` +- `codetracer-python-recorder/src/trace_filter/engine.rs` +- `codetracer-python-recorder/tests/python/test_monitoring_events.py` +- `codetracer-python-recorder/tests/rust` + +## Workstream Progress + +### WS1 – Shared Module Identity Helper +- **Scope recap:** Extract and centralise module-name derivation (relative path stripping + `sys.modules` lookup) so both filters and the runtime tracer can reuse it with caching. +- **Status:** _Completed_ +- **Notes:** `src/module_identity.rs` now owns `ModuleIdentityResolver`, `ModuleIdentityCache`, sanitisation helpers, and unit tests covering `.py` vs `.pyc`, package roots, and hint precedence. `TraceFilterEngine` uses the shared resolver for all module lookups, keeping behaviour aligned between filtering and runtime components. + +### WS2 – Runtime Tracer Integration +- **Scope recap:** Detect module-level code (`co_qualname == ""`) and rename call events to `` using the shared resolver; plumb filter-derived names to avoid duplicate work. +- **Status:** _Completed_ +- **Notes:** `RuntimeTracer` now rewrites module-level call events via `ModuleIdentityCache`, clears the cache alongside `function_ids`, and exposes a test hook to verify naming logic. Added a unit test (`module_import_records_module_name`) and a Python integration test (`test_module_imports_record_package_names`) that traces a real import to confirm `` shows up in `trace.json`. + +### WS3 – Testing, Tooling, and Docs +- **Scope recap:** Add regression tests (Python + Rust) validating the new naming, update documentation/changelog, and refresh any snapshot expectations. +- **Status:** _Completed_ +- **Notes:** Added a Rust unit test plus an integration test in `tests/python/test_monitoring_events.py`, documented the behaviour in the README, and recorded the change in `CHANGELOG.md`. Snapshot consumers now rely on the `` naming convention. + +## Next Checkpoints +1. Implement shared resolver scaffolding (WS1). +2. Wire runtime tracer and verify traces emit `` names (WS2). +3. Land regression tests and docs, then update this status file accordingly (WS3).