Skip to content

Commit ddf3653

Browse files
committed
Initial Python tracer based on runtime_tracing
Summary of changes - Implemented a Rust-backed tracer using runtime_tracing. - Added a concrete implementation of the codetracer-python-recorder/src/tracer.rs trait. - Wired the runtime tracer into the public Python API (start/stop/flush). Key details - New tracer implementation: src/runtime_tracer.rs - Struct RuntimeTracer backed by runtime_tracing::NonStreamingTraceWriter. - Maps selected sys.monitoring events (CALL, LINE, PY_RETURN) to runtime_tracing: - CALL: registers function and call (without capturing full arg lists yet). - LINE: registers a Step. - PY_RETURN: registers return value (optionally captured as ValueRecord). - Minimal value encoder for None, bool, int, str; falls back to Raw string for others. - Begins writing metadata, paths, and events on start and flushes them on finish. - Exposes helper derive_sidecar_paths(events_path) -> (metadata.json, paths.json). - Extended Tracer trait: src/tracer.rs - Now Tracer: Send + Any (to safely store in static Mutex). - Added downcasting support: fn as_any(&mut self) for optional future use. - Added default lifecycle hooks: - fn flush(&mut self, _py) -> PyResult<()> { Ok(()) } - fn finish(&mut self, _py) -> PyResult<()> { Ok(()) } - Added flush_installed_tracer(py) to flush current tracer without uninstalling. - uninstall_tracer(py) now calls tracer.finish(py) before unhooking callbacks. - Python API integration: src/lib.rs - start_tracing(path, format, capture_values, source_roots) - Prevents double-start via ACTIVE flag. - Creates RuntimeTracer, derives output file names: - events: path (user-provided) - metadata: path with extension metadata.json - paths: path with extension paths.json - For simplicity and to remain object-safe in this environment, “binary” maps to the non-streaming BinaryV0 writer. JSON is supported as json. - Installs tracer via sys.monitoring and flips ACTIVE to true. - stop_tracing() - Uninstalls the tracer (which calls finish on the tracer) and sets ACTIVE false. - flush_tracing() - Calls flush_installed_tracer(py). For non-streaming formats this writes events to the file; for streaming formats (not used here) this would be a no-op by design. - is_tracing(): returns ACTIVE. Behavioral notes - Formats: - json: uses JSON non-streaming writer. - binary: mapped to BinaryV0 non-streaming writer in this implementation to avoid relying on the private streaming writer module and to keep the tracer object Send-safe. - Sidecar files: - metadata: .metadata.json - paths: .paths.json - events: - Values capturing: - Optional via capture_values flag. Basic types (None, bool, int, str) are handled; all others fall back to Raw. What I didn’t change - Existing tests and their tracer implementations (PrintTracer, CountingTracer) remain compatible because all new Tracer methods are default no-ops. Next steps (optional) - Expand event interest set (e.g., exceptions, C_RETURN/C_RAISE) to record richer traces. - Enhance value capture and variable bindings for arguments and locals (requires more Python-level context). - Consider supporting streaming binary output once a public API for the streaming writer is exposed (or if constraints allow depending on runtime_tracing crate updates). Commands to run - Build and tests rely on your environment’s Python/PyO3 toolchain. The repo’s recommended way: - just venv 3.13 dev - just test This implementation hooks into sys.monitoring, records with runtime_tracing, and exposes start/stop/flush in the Python module, keeping the code defensive, testable, and focused on the requested trait implementation. Signed-off-by: Tzanko Matev <[email protected]> Activate tracing on script entry codetracer-python-recorder/codetracer_python_recorder/__main__.py: codetracer-python-recorder/codetracer_python_recorder/api.py: codetracer-python-recorder/src/lib.rs: codetracer-python-recorder/src/runtime_tracer.rs: trace.json: trace.paths.json: Signed-off-by: Tzanko Matev <[email protected]> Only trace files in a whitelist (experiment) codetracer-python-recorder/codetracer_python_recorder/__main__.py: codetracer-python-recorder/src/lib.rs: codetracer-python-recorder/src/runtime_tracer.rs: Signed-off-by: Tzanko Matev <[email protected]>
1 parent 1cc5a71 commit ddf3653

File tree

9 files changed

+658
-60
lines changed

9 files changed

+658
-60
lines changed

codetracer-python-recorder/Cargo.lock

Lines changed: 259 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

codetracer-python-recorder/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ runtime_tracing = "0.14.0"
2020
bitflags = "2.4"
2121
once_cell = "1.19"
2222
dashmap = "5.5"
23+
log = "0.4"
24+
env_logger = "0.11"
2325

2426
[dev-dependencies]
2527
pyo3 = { version = "0.25.1", features = ["auto-initialize"] }

codetracer-python-recorder/codetracer_python_recorder/api.py

Lines changed: 19 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,15 @@
22
33
This module exposes a minimal interface for starting and stopping
44
runtime traces. The heavy lifting is delegated to the
5-
`codetracer_python_recorder` Rust extension which will eventually hook
6-
into `runtime_tracing` and `sys.monitoring`. For now the Rust side only
7-
maintains placeholder state and performs no actual tracing.
5+
`codetracer_python_recorder` Rust extension which hooks
6+
into `runtime_tracing` and `sys.monitoring`.
87
"""
98
from __future__ import annotations
109

1110
import contextlib
1211
import os
1312
from pathlib import Path
14-
from typing import Iterable, Iterator, Optional
13+
from typing import Iterator, Optional
1514

1615
from .codetracer_python_recorder import (
1716
flush_tracing as _flush_backend,
@@ -27,31 +26,34 @@
2726
_active_session: Optional["TraceSession"] = None
2827

2928

30-
def _normalize_source_roots(source_roots: Iterable[os.PathLike | str] | None) -> Optional[list[str]]:
31-
if source_roots is None:
32-
return None
33-
return [str(Path(p)) for p in source_roots]
34-
35-
3629
def start(
3730
path: os.PathLike | str,
3831
*,
3932
format: str = DEFAULT_FORMAT,
40-
capture_values: bool = True,
41-
source_roots: Iterable[os.PathLike | str] | None = None,
33+
start_on_enter: os.PathLike | str | None = None,
4234
) -> "TraceSession":
4335
"""Start a global trace session.
4436
45-
Parameters mirror the design document. The current implementation
46-
merely records the active state on the Rust side and performs no
47-
tracing.
37+
- ``path``: Target directory where trace files will be written.
38+
Files created: ``trace.json``/``trace.bin``, ``trace_metadata.json``, ``trace_paths.json``.
39+
- ``format``: Either ``binary`` or ``json`` (controls events file name/format).
40+
- ``start_on_enter``: Optional file path; when provided, tracing remains
41+
paused until the tracer observes execution entering this file. Useful to
42+
avoid recording interpreter and import startup noise when launching a
43+
script via the CLI.
44+
45+
The current implementation records trace data through a Rust backend.
4846
"""
4947
global _active_session
5048
if _is_tracing_backend():
5149
raise RuntimeError("tracing already active")
5250

5351
trace_path = Path(path)
54-
_start_backend(str(trace_path), format, capture_values, _normalize_source_roots(source_roots))
52+
_start_backend(
53+
str(trace_path),
54+
format,
55+
str(Path(start_on_enter)) if start_on_enter is not None else None,
56+
)
5557
session = TraceSession(path=trace_path, format=format)
5658
_active_session = session
5759
return session
@@ -86,15 +88,11 @@ def trace(
8688
path: os.PathLike | str,
8789
*,
8890
format: str = DEFAULT_FORMAT,
89-
capture_values: bool = True,
90-
source_roots: Iterable[os.PathLike | str] | None = None,
9191
) -> Iterator["TraceSession"]:
9292
"""Context manager helper for scoped tracing."""
9393
session = start(
9494
path,
9595
format=format,
96-
capture_values=capture_values,
97-
source_roots=source_roots,
9896
)
9997
try:
10098
yield session
@@ -133,11 +131,7 @@ def _auto_start_from_env() -> None:
133131
if not path:
134132
return
135133
fmt = os.getenv("CODETRACER_FORMAT", DEFAULT_FORMAT)
136-
capture_env = os.getenv("CODETRACER_CAPTURE_VALUES")
137-
capture = True
138-
if capture_env is not None:
139-
capture = capture_env.lower() not in {"0", "false", "no"}
140-
start(path, format=fmt, capture_values=capture)
134+
start(path, format=fmt)
141135

142136

143137
_auto_start_from_env()

codetracer-python-recorder/src/lib.rs

Lines changed: 106 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,127 @@
1+
use std::fs;
2+
use std::path::{PathBuf, Path};
13
use std::sync::atomic::{AtomicBool, Ordering};
4+
use std::sync::Once;
25

36
use pyo3::exceptions::PyRuntimeError;
47
use pyo3::prelude::*;
8+
use std::fmt;
59

610
pub mod code_object;
711
pub mod tracer;
12+
mod runtime_tracer;
813
pub use crate::code_object::{CodeObjectRegistry, CodeObjectWrapper};
914
pub use crate::tracer::{install_tracer, uninstall_tracer, EventSet, Tracer};
1015

1116
/// Global flag tracking whether tracing is active.
1217
static ACTIVE: AtomicBool = AtomicBool::new(false);
1318

14-
/// Start tracing. Placeholder implementation that simply flips the
15-
/// global active flag and ignores all parameters.
19+
// Initialize Rust logging once per process. Defaults to debug for this crate
20+
// unless overridden by RUST_LOG. This helps surface debug! output during dev.
21+
static INIT_LOGGER: Once = Once::new();
22+
23+
fn init_rust_logging_with_default(default_filter: &str) {
24+
INIT_LOGGER.call_once(|| {
25+
let env = env_logger::Env::default().default_filter_or(default_filter);
26+
// Use a compact format with timestamps and targets to aid debugging.
27+
let mut builder = env_logger::Builder::from_env(env);
28+
builder
29+
.format_timestamp_micros()
30+
.format_target(true);
31+
let _ = builder.try_init();
32+
});
33+
}
34+
35+
/// Start tracing using sys.monitoring and runtime_tracing writer.
1636
#[pyfunction]
1737
fn start_tracing(
18-
_path: &str,
19-
_format: &str,
20-
_capture_values: bool,
21-
_source_roots: Option<Vec<String>>,
38+
path: &str,
39+
format: &str,
40+
activation_path: Option<&str>,
2241
) -> PyResult<()> {
23-
if ACTIVE.swap(true, Ordering::SeqCst) {
42+
// Ensure logging is ready before any tracer logs might be emitted.
43+
// Default only our crate to debug to avoid excessive verbosity from deps.
44+
init_rust_logging_with_default("codetracer_python_recorder=debug");
45+
if ACTIVE.load(Ordering::SeqCst) {
2446
return Err(PyRuntimeError::new_err("tracing already active"));
2547
}
26-
Ok(())
48+
49+
// Interpret `path` as a directory where trace files will be written.
50+
let out_dir = Path::new(path);
51+
if out_dir.exists() && !out_dir.is_dir() {
52+
return Err(PyRuntimeError::new_err("trace path exists and is not a directory"));
53+
}
54+
if !out_dir.exists() {
55+
// Best-effort create the directory tree
56+
fs::create_dir_all(&out_dir)
57+
.map_err(|e| PyRuntimeError::new_err(format!("failed to create trace directory: {}", e)))?;
58+
}
59+
60+
// Map format string to enum
61+
let fmt = match format.to_lowercase().as_str() {
62+
"json" => runtime_tracing::TraceEventsFileFormat::Json,
63+
// Use BinaryV0 for "binary" to avoid streaming writer here.
64+
"binary" | "binaryv0" | "binary_v0" | "b0" => runtime_tracing::TraceEventsFileFormat::BinaryV0,
65+
//TODO AI! We need to assert! that the format is among the known values.
66+
other => {
67+
eprintln!("Unknown format '{}', defaulting to binary (v0)", other);
68+
runtime_tracing::TraceEventsFileFormat::BinaryV0
69+
}
70+
};
71+
72+
// Build output file paths inside the directory.
73+
let (events_path, meta_path, paths_path) = match fmt {
74+
runtime_tracing::TraceEventsFileFormat::Json => (
75+
out_dir.join("trace.json"),
76+
out_dir.join("trace_metadata.json"),
77+
out_dir.join("trace_paths.json"),
78+
),
79+
_ => (
80+
out_dir.join("trace.bin"),
81+
out_dir.join("trace_metadata.json"),
82+
out_dir.join("trace_paths.json"),
83+
),
84+
};
85+
86+
// Activation path: when set, tracing starts only after entering it.
87+
let activation_path = activation_path.map(|s| Path::new(s));
88+
89+
Python::with_gil(|py| {
90+
// Program and args: keep minimal; Python-side API stores full session info if needed
91+
let sys = py.import("sys")?;
92+
let argv = sys.getattr("argv")?;
93+
let program: String = argv
94+
.get_item(0)?
95+
.extract::<String>()?;
96+
//TODO: Error-handling. What to do if argv is empty? Does this ever happen?
97+
98+
let mut tracer = runtime_tracer::RuntimeTracer::new(
99+
&program,
100+
&[],
101+
fmt,
102+
activation_path,
103+
);
104+
105+
// Start location: prefer activation path, otherwise best-effort argv[0]
106+
let start_path: &Path = activation_path.unwrap_or(Path::new(&program));
107+
tracer.begin(&meta_path, &paths_path, &events_path, start_path, 1)?;
108+
109+
// Install callbacks
110+
install_tracer(py, Box::new(tracer))?;
111+
ACTIVE.store(true, Ordering::SeqCst);
112+
Ok(())
113+
})
27114
}
28115

29116
/// Stop tracing by resetting the global flag.
30117
#[pyfunction]
31118
fn stop_tracing() -> PyResult<()> {
32-
ACTIVE.store(false, Ordering::SeqCst);
33-
Ok(())
119+
Python::with_gil(|py| {
120+
// Uninstall triggers finish() on tracer implementation.
121+
uninstall_tracer(py)?;
122+
ACTIVE.store(false, Ordering::SeqCst);
123+
Ok(())
124+
})
34125
}
35126

36127
/// Query whether tracing is currently active.
@@ -39,15 +130,18 @@ fn is_tracing() -> PyResult<bool> {
39130
Ok(ACTIVE.load(Ordering::SeqCst))
40131
}
41132

42-
/// Flush buffered trace data. No-op placeholder for now.
133+
/// Flush buffered trace data (best-effort, non-streaming formats only).
43134
#[pyfunction]
44135
fn flush_tracing() -> PyResult<()> {
45-
Ok(())
136+
Python::with_gil(|py| crate::tracer::flush_installed_tracer(py))
46137
}
47138

48139
/// Python module definition.
49140
#[pymodule]
50141
fn codetracer_python_recorder(_py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
142+
// Initialize logging on import so users see logs without extra setup.
143+
// Respect RUST_LOG if present; otherwise default to debug for this crate.
144+
init_rust_logging_with_default("codetracer_python_recorder=debug");
51145
m.add_function(wrap_pyfunction!(start_tracing, m)?)?;
52146
m.add_function(wrap_pyfunction!(stop_tracing, m)?)?;
53147
m.add_function(wrap_pyfunction!(is_tracing, m)?)?;

0 commit comments

Comments
 (0)