Skip to content

Commit efea1b5

Browse files
authored
[Refactor] Single responsibility principle (#42)
The purpose of the refactor was to split current code into files and functions where each file/function is responsible for a single concern. See the design docs in this commit for details. No tests were touched ## File-SRP - ✅ Step 2 complete: introduced `src/logging.rs` for one-time logger initialisation and migrated tracing session lifecycle (`start_tracing`, `stop_tracing`, `is_tracing`, `flush_tracing`, `ACTIVE` flag) into `src/session.rs`, with `src/lib.rs` now limited to PyO3 wiring and re-exports. - ✅ Step 3 complete: added `src/runtime/mod.rs` with focused `activation`, `value_encoder`, and `output_paths` submodules; `RuntimeTracer` now delegates activation gating, value encoding, and writer initialisation through the façade consumed by `session.rs`. - ✅ Step 4 complete: introduced `src/monitoring/mod.rs` for sys.monitoring types/caches and `src/monitoring/tracer.rs` for the tracer trait plus callback dispatch; rewired `lib.rs`, `session.rs`, and `runtime/mod.rs`, and kept a top-level `tracer` re-export for API stability. - ✅ Step 5 complete: split the Python package into dedicated `formats.py`, `session.py`, and `auto_start.py` modules, trimmed `api.py` to a thin façade, and moved the environment auto-start hook into `__init__.py`. - ✅ Step 6 complete: resolved outstanding Rust TODOs (format validation, argv handling, function id stability), expanded module documentation so `cargo doc` reflects the architecture, and re-ran `just test` to confirm the refactor remains green. - ✅ Test baseline: `just test` (nextest + pytest) passes with the UV cache scoped to the workspace; direct `cargo test` still requires CPython development symbols. ## Function-SRP ## Stage 0 – Baseline & Guardrails - ✅ `just test` (Rust + Python suites) passes; captured run via the top-level recipe. - ✅ Generated JSON and binary reference traces from `examples/value_capture_all.py`; outputs stored in `artifacts/stage0/value-capture-json/` and `artifacts/stage0/value-capture-binary/`. - ⏳ Summarise current `ActivationController` behaviour and frame traversal notes for reviewer context. ## Stage 1 – Session Start-Up Decomposition - ✅ Step 1 (Rust): Introduced `session::bootstrap` helpers and refactored `start_tracing` to delegate directory validation, format resolution, and program metadata collection. Tests remain green. - ✅ Step 2 (Python): Extracted `_coerce_format`, `_validate_trace_path`, and `_normalize_activation_path` helpers; added tests covering invalid formats and conflicting paths. ## Stage 2 – Frame Inspection & Activation Separation - ✅ Step 1: Added `runtime::frame_inspector::capture_frame` to encapsulate frame lookup, locals/globals materialisation, and reference counting; `on_line` now delegates to the helper while preserving behaviour. - ✅ Step 2: Extended `ActivationController` with `should_process_event`/`handle_return_event`, updated callbacks to rely on them, and removed direct state juggling from `RuntimeTracer`. ## Stage 3 – Value Capture Layer - ✅ Step 1: Introduced `runtime::value_capture::capture_call_arguments`; `on_py_start` now delegates to it, keeping the function focused on orchestration while reusing frame inspectors. - ✅ Step 2: Added `record_visible_scope` helper and refactored `on_line` to delegate locals/globals registration through it. ## Stage 4 – Return Handling & Logging Harmonisation - ✅ Added `runtime::logging::log_event` to consolidate callback logging across start, line, and return handlers. - ✅ Exposed `record_return_value` in `runtime::value_capture` and refactored `RuntimeTracer::on_py_return` to orchestrate activation checks, logging, and value recording. - ✅ Extended runtime tests with explicit return capture coverage and activation deactivation assertions. ## Stage 5 – Cleanup & Regression Sweep - ✅ Audited runtime modules for obsolete inline comments or TODOs introduced pre-refactor; none remained after helper extraction. - ✅ Documented the helper module map in `design-docs/function-level-srp-refactor-plan.md` for contributor onboarding. - ✅ Re-ran `just test` (Rust `cargo nextest` + Python `pytest`) to confirm post-cleanup parity.
2 parents 2c7bcef + 5e2d54e commit efea1b5

27 files changed

+1736
-844
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,6 @@
66
build
77
*~
88
.idea/
9-
.cargo/
9+
.cargo/
10+
11+
**/*.egg-info/

codetracer-python-recorder/codetracer_python_recorder/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,8 @@
99

1010
from . import api as _api
1111
from .api import * # re-export public API symbols
12+
from .auto_start import auto_start_from_env
13+
14+
auto_start_from_env()
1215

1316
__all__ = _api.__all__
Lines changed: 6 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -1,142 +1,12 @@
1-
"""High-level tracing API built on a Rust backend.
2-
3-
This module exposes a minimal interface for starting and stopping
4-
runtime traces. The heavy lifting is delegated to the
5-
`codetracer_python_recorder` Rust extension which hooks
6-
into `runtime_tracing` and `sys.monitoring`.
7-
"""
1+
"""High-level tracing API built on a Rust backend."""
82
from __future__ import annotations
93

10-
import contextlib
11-
import os
12-
from pathlib import Path
13-
from typing import Iterator, Optional
14-
15-
from .codetracer_python_recorder import (
16-
flush_tracing as _flush_backend,
17-
is_tracing as _is_tracing_backend,
18-
start_tracing as _start_backend,
19-
stop_tracing as _stop_backend,
20-
)
21-
22-
TRACE_BINARY: str = "binary"
23-
TRACE_JSON: str = "json"
24-
DEFAULT_FORMAT: str = TRACE_BINARY
25-
26-
_active_session: Optional["TraceSession"] = None
27-
28-
29-
def start(
30-
path: os.PathLike | str,
31-
*,
32-
format: str = DEFAULT_FORMAT,
33-
start_on_enter: os.PathLike | str | None = None,
34-
) -> "TraceSession":
35-
"""Start a global trace session.
36-
37-
- ``path``: Target directory where trace files will be written.
38-
Files created: ``trace.json``/``trace.bin``, ``trace_metadata.json``, ``trace_paths.json``.
39-
- ``format``: Either ``binary`` or ``json`` (controls events file name/format).
40-
- ``start_on_enter``: Optional file path; when provided, tracing remains
41-
paused until the tracer observes execution entering this file. Useful to
42-
avoid recording interpreter and import startup noise when launching a
43-
script via the CLI.
44-
45-
The current implementation records trace data through a Rust backend.
46-
"""
47-
global _active_session
48-
if _is_tracing_backend():
49-
raise RuntimeError("tracing already active")
50-
51-
trace_path = Path(path)
52-
_start_backend(
53-
str(trace_path),
54-
format,
55-
str(Path(start_on_enter)) if start_on_enter is not None else None,
56-
)
57-
session = TraceSession(path=trace_path, format=format)
58-
_active_session = session
59-
return session
60-
61-
62-
def stop() -> None:
63-
"""Stop the active trace session if one is running."""
64-
global _active_session
65-
if not _is_tracing_backend():
66-
return
67-
_stop_backend()
68-
_active_session = None
69-
70-
71-
def is_tracing() -> bool:
72-
"""Return ``True`` when a trace session is active."""
73-
return _is_tracing_backend()
74-
75-
76-
def flush() -> None:
77-
"""Flush buffered trace data.
4+
from typing import Iterable
785

79-
With the current placeholder implementation this is a no-op but the
80-
function is provided to match the planned public API.
81-
"""
82-
if _is_tracing_backend():
83-
_flush_backend()
6+
from .formats import DEFAULT_FORMAT, TRACE_BINARY, TRACE_JSON
7+
from .session import TraceSession, flush, is_tracing, start, stop, trace
848

85-
86-
@contextlib.contextmanager
87-
def trace(
88-
path: os.PathLike | str,
89-
*,
90-
format: str = DEFAULT_FORMAT,
91-
) -> Iterator["TraceSession"]:
92-
"""Context manager helper for scoped tracing."""
93-
session = start(
94-
path,
95-
format=format,
96-
)
97-
try:
98-
yield session
99-
finally:
100-
session.stop()
101-
102-
103-
class TraceSession:
104-
"""Handle representing a live tracing session."""
105-
106-
path: Path
107-
format: str
108-
109-
def __init__(self, path: Path, format: str) -> None:
110-
self.path = path
111-
self.format = format
112-
113-
def stop(self) -> None:
114-
"""Stop this trace session."""
115-
if _active_session is self:
116-
stop()
117-
118-
def flush(self) -> None:
119-
"""Flush buffered trace data for this session."""
120-
flush()
121-
122-
def __enter__(self) -> "TraceSession":
123-
return self
124-
125-
def __exit__(self, exc_type, exc, tb) -> None: # pragma: no cover - thin wrapper
126-
self.stop()
127-
128-
129-
def _auto_start_from_env() -> None:
130-
path = os.getenv("CODETRACER_TRACE")
131-
if not path:
132-
return
133-
fmt = os.getenv("CODETRACER_FORMAT", DEFAULT_FORMAT)
134-
start(path, format=fmt)
135-
136-
137-
_auto_start_from_env()
138-
139-
__all__ = [
9+
__all__: Iterable[str] = (
14010
"TraceSession",
14111
"DEFAULT_FORMAT",
14212
"TRACE_BINARY",
@@ -146,4 +16,4 @@ def _auto_start_from_env() -> None:
14616
"is_tracing",
14717
"trace",
14818
"flush",
149-
]
19+
)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
"""Environment-driven trace auto-start helper."""
2+
from __future__ import annotations
3+
4+
import logging
5+
import os
6+
from typing import Iterable
7+
8+
from .formats import DEFAULT_FORMAT
9+
10+
ENV_TRACE_PATH = "CODETRACER_TRACE"
11+
ENV_TRACE_FORMAT = "CODETRACER_FORMAT"
12+
13+
log = logging.getLogger(__name__)
14+
15+
16+
def auto_start_from_env() -> None:
17+
"""Start tracing automatically when the relevant environment variables are set."""
18+
path = os.getenv(ENV_TRACE_PATH)
19+
if not path:
20+
return
21+
22+
# Delay import to avoid boot-time circular dependencies.
23+
from . import session
24+
25+
if session.is_tracing():
26+
log.debug("codetracer auto-start skipped: tracing already active")
27+
return
28+
29+
fmt = os.getenv(ENV_TRACE_FORMAT, DEFAULT_FORMAT)
30+
log.debug(
31+
"codetracer auto-start triggered", extra={"trace_path": path, "format": fmt}
32+
)
33+
session.start(path, format=fmt)
34+
35+
36+
__all__: Iterable[str] = (
37+
"ENV_TRACE_FORMAT",
38+
"ENV_TRACE_PATH",
39+
"auto_start_from_env",
40+
)
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
"""Trace format constants and helpers."""
2+
from __future__ import annotations
3+
4+
from typing import Iterable
5+
6+
TRACE_BINARY: str = "binary"
7+
TRACE_JSON: str = "json"
8+
DEFAULT_FORMAT: str = TRACE_BINARY
9+
SUPPORTED_FORMATS: frozenset[str] = frozenset({TRACE_BINARY, TRACE_JSON})
10+
11+
12+
def normalize_format(value: str | None) -> str:
13+
"""Normalise user-provided strings to the format names recognised by the backend.
14+
15+
The runtime currently accepts ``"binary"`` (plus legacy aliases handled
16+
on the Rust side) and ``"json"``. Unknown formats fall back to the
17+
lower-cased input so the backend can decide how to react; callers can
18+
choose to guard against unsupported values by checking ``SUPPORTED_FORMATS``.
19+
"""
20+
if value is None:
21+
return DEFAULT_FORMAT
22+
return value.lower()
23+
24+
25+
def is_supported(value: str) -> bool:
26+
"""Return ``True`` if *value* is one of the officially supported formats."""
27+
return value.lower() in SUPPORTED_FORMATS
28+
29+
30+
__all__: Iterable[str] = (
31+
"DEFAULT_FORMAT",
32+
"TRACE_BINARY",
33+
"TRACE_JSON",
34+
"SUPPORTED_FORMATS",
35+
"is_supported",
36+
"normalize_format",
37+
)
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
"""Tracing session management helpers."""
2+
from __future__ import annotations
3+
4+
import contextlib
5+
from pathlib import Path
6+
from typing import Iterator, Optional
7+
8+
from .codetracer_python_recorder import (
9+
flush_tracing as _flush_backend,
10+
is_tracing as _is_tracing_backend,
11+
start_tracing as _start_backend,
12+
stop_tracing as _stop_backend,
13+
)
14+
from .formats import DEFAULT_FORMAT, SUPPORTED_FORMATS, is_supported, normalize_format
15+
16+
_active_session: Optional["TraceSession"] = None
17+
18+
19+
class TraceSession:
20+
"""Handle representing a live tracing session."""
21+
22+
path: Path
23+
format: str
24+
25+
def __init__(self, path: Path, format: str) -> None:
26+
self.path = path
27+
self.format = format
28+
29+
def stop(self) -> None:
30+
"""Stop this trace session."""
31+
if _active_session is self:
32+
stop()
33+
34+
def flush(self) -> None:
35+
"""Flush buffered trace data for this session."""
36+
flush()
37+
38+
def __enter__(self) -> "TraceSession":
39+
return self
40+
41+
def __exit__(self, exc_type, exc, tb) -> None: # pragma: no cover - thin wrapper
42+
self.stop()
43+
44+
45+
def start(
46+
path: str | Path,
47+
*,
48+
format: str = DEFAULT_FORMAT,
49+
start_on_enter: str | Path | None = None,
50+
) -> TraceSession:
51+
"""Start a new global trace session."""
52+
global _active_session
53+
if _is_tracing_backend():
54+
raise RuntimeError("tracing already active")
55+
56+
trace_path = _validate_trace_path(Path(path))
57+
normalized_format = _coerce_format(format)
58+
activation_path = _normalize_activation_path(start_on_enter)
59+
60+
_start_backend(str(trace_path), normalized_format, activation_path)
61+
session = TraceSession(path=trace_path, format=normalized_format)
62+
_active_session = session
63+
return session
64+
65+
66+
def stop() -> None:
67+
"""Stop the active trace session if one is running."""
68+
global _active_session
69+
if not _is_tracing_backend():
70+
return
71+
_stop_backend()
72+
_active_session = None
73+
74+
75+
def is_tracing() -> bool:
76+
"""Return ``True`` when a trace session is active."""
77+
return _is_tracing_backend()
78+
79+
80+
def flush() -> None:
81+
"""Flush buffered trace data."""
82+
if _is_tracing_backend():
83+
_flush_backend()
84+
85+
86+
@contextlib.contextmanager
87+
def trace(
88+
path: str | Path,
89+
*,
90+
format: str = DEFAULT_FORMAT,
91+
) -> Iterator[TraceSession]:
92+
"""Context manager helper for scoped tracing."""
93+
session = start(path, format=format)
94+
try:
95+
yield session
96+
finally:
97+
session.stop()
98+
99+
100+
def _coerce_format(value: str) -> str:
101+
normalized = normalize_format(value)
102+
if not is_supported(normalized):
103+
supported = ", ".join(sorted(SUPPORTED_FORMATS))
104+
raise ValueError(
105+
f"unsupported trace format '{value}'. Expected one of: {supported}"
106+
)
107+
return normalized
108+
109+
110+
def _validate_trace_path(path: Path) -> Path:
111+
path = path.expanduser()
112+
if path.exists() and not path.is_dir():
113+
raise ValueError("trace path exists and is not a directory")
114+
return path
115+
116+
117+
def _normalize_activation_path(value: str | Path | None) -> str | None:
118+
if value is None:
119+
return None
120+
return str(Path(value).expanduser())
121+
122+
123+
__all__ = (
124+
"TraceSession",
125+
"flush",
126+
"is_tracing",
127+
"start",
128+
"stop",
129+
"trace",
130+
)

codetracer-python-recorder/src/code_object.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
//! Shared code-object caching utilities for sys.monitoring callbacks.
2+
13
use dashmap::DashMap;
24
use once_cell::sync::OnceCell;
35
use pyo3::prelude::*;
@@ -148,7 +150,8 @@ impl CodeObjectRegistry {
148150
self.map
149151
.entry(id)
150152
.or_insert_with(|| Arc::new(CodeObjectWrapper::new(py, code)))
151-
.clone() //AI? Why do we need to clone here?
153+
// Clone the `Arc` so each caller receives its own reference-counted handle.
154+
.clone()
152155
}
153156

154157
/// Remove the wrapper for a given code id, if present.

0 commit comments

Comments
 (0)