|
| 1 | +# ADR: Non-invasive stdout/stderr/stdin capture and line-level mapping for the Python tracer (PyO3 + `sys.monitoring`) |
| 2 | + |
| 3 | +**Status**: Accepted |
| 4 | +**Date**: 2025-10-01 |
| 5 | +**Owners**: Tracing/Recorder team |
| 6 | +**Scope**: Recorder runtime (Rust/PyO3) and Python instrumentation glue |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Context |
| 11 | + |
| 12 | +We are building a Python tracing recorder in Rust (PyO3) that: |
| 13 | + |
| 14 | +* runs a target Python script (3.12+), |
| 15 | +* captures **stdout/stderr** without modifying user code, |
| 16 | +* feeds/captures **stdin**, |
| 17 | +* maps emitted output to the **source line** that produced it, |
| 18 | +* records events for **post-mortem replay** (not live UI), |
| 19 | +* works with **threads** and **asyncio**, |
| 20 | +* allows **Rust-side logging** that does not contaminate captured stdout/stderr. |
| 21 | + |
| 22 | +Current runner uses `runpy.run_path` from Python space; we now need the same behavior when launching from Rust to enable FD-level capture. |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## Decision |
| 27 | + |
| 28 | +1. **Output capture method**: |
| 29 | + Use **file-descriptor (FD) redirection** with **pipes** (or PTY when TTY semantics required): |
| 30 | + |
| 31 | + * Redirect **stdout(1)** and **stderr(2)** to pipes via `dup2` (Unix) / `SetStdHandle` (Windows). |
| 32 | + * Drain both ends concurrently in background threads; timestamp each chunk. |
| 33 | + |
| 34 | +2. **Input capture/feeding**: |
| 35 | + Redirect **stdin(0)** to a pipe. The controller writes scripted input (or forwards from the original stdin) and **closes** the write end to signal EOF. Optionally tee input to a log for replay. |
| 36 | + |
| 37 | +3. **Execution model** (same semantics as `runpy.run_path`): |
| 38 | + From Rust/PyO3, call **`runpy.run_path(path, run_name="__main__")`** after: |
| 39 | + |
| 40 | + * setting `sys.argv`, |
| 41 | + * setting `sys.path[0]` to the script directory, |
| 42 | + * `chdir` to the script directory. |
| 43 | + |
| 44 | +4. **Tracing API**: |
| 45 | + Use Python 3.12+ **`sys.monitoring`**: |
| 46 | + |
| 47 | + * Allocate a dedicated **tool_id**. |
| 48 | + * Enable at minimum **LINE** events globally (`set_events`), optionally **CALL/C_RETURN** for finer disambiguation. |
| 49 | + * Register a **lightweight callback** that records `(ts, thread_ident, filename, func_name, line_no)`. |
| 50 | + |
| 51 | +5. **Mapping outputs → lines**: |
| 52 | + Merge the two time-ordered streams: |
| 53 | + |
| 54 | + * For each output chunk, associate with the **most recent preceding** LINE event (per-thread if thread identifiable; otherwise global order). |
| 55 | + * Store `(line_ref, stream, bytes, ts_start, ts_end)` records for replay. |
| 56 | + |
| 57 | +6. **Rust logging isolation**: |
| 58 | + Initialize a logger that writes to **the saved original stderr FD** (pre-redirection) or to a dedicated file/syslog. **Never** use `println!/eprintln!` after redirection. |
| 59 | + |
| 60 | +7. **Buffering & latency**: |
| 61 | + For timeliness (optional), set `sys.stdout.reconfigure(line_buffering=True)` / same for stderr. Not strictly required for correctness. |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## Alternatives Considered |
| 66 | + |
| 67 | +* **Monkey-patch `sys.stdout`/`sys.stderr`**: rejected (misses C-level writes, invasive, behavior-changing). |
| 68 | +* **Only `sys.settrace`/`sys.monitoring` without FD capture**: rejected (no output capture from C/native extensions). |
| 69 | +* **Use C-API `PyRun_*` instead of `runpy`**: viable but more edge-cases; `runpy.run_path` already matches CLI semantics reliably. |
| 70 | +* **Single combined PTY**: good for interactive TTY behavior, but merges stdout/err; choose per use-case. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Consequences |
| 75 | + |
| 76 | +* **Pros**: |
| 77 | + |
| 78 | + * Captures all output (Python & C extensions) non-invasively. |
| 79 | + * Accurate line mapping via `sys.monitoring`. |
| 80 | + * Works with threads/async; deterministic post-mortem merging via timestamps. |
| 81 | + * Debug logging remains out-of-band. |
| 82 | + |
| 83 | +* **Cons / Risks**: |
| 84 | + |
| 85 | + * FD redirection is process-wide; be careful with concurrent embeddings. |
| 86 | + * Pipes can block if not drained—must have dedicated reader threads. |
| 87 | + * Interleaved writes from multiple threads may produce mixed chunks; per-thread mapping is best-effort. |
| 88 | + * Slight runtime overhead from LINE callbacks; CALL/C_RETURN add more. |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## Detailed Design |
| 93 | + |
| 94 | +### A. Runner lifecycle (Rust) |
| 95 | + |
| 96 | +1. **Prepare capture** |
| 97 | + |
| 98 | + * `pipe()` for out/err/in. |
| 99 | + * `dup()` and save originals for 0/1/2. |
| 100 | + * `dup2` to redirect: `1→out_w`, `2→err_w`, `0→in_r`. Close extra ends. |
| 101 | + |
| 102 | +2. **Start drainers** |
| 103 | + |
| 104 | + * Thread A: read `out_r` (blocking), timestamp, append to `stdout` buffer/queue. |
| 105 | + * Thread B: read `err_r`, same for `stderr`. |
| 106 | + |
| 107 | +3. **(Optional) stdin strategy** |
| 108 | + |
| 109 | + * Scripted: write bytes to `in_w`, then close to signal EOF. |
| 110 | + * Passthrough: thread C copies from **saved** original stdin to `in_w` and also logs for replay. |
| 111 | + |
| 112 | +4. **Initialize Python** |
| 113 | + |
| 114 | + * Acquire GIL. |
| 115 | + * Enable monitoring: `use_tool_id`, `register_callback(LINE)`, `set_events(LINE | optional CALL/C_RETURN)`. |
| 116 | + * Set `sys.argv`, adjust `sys.path[0]`, `chdir`. |
| 117 | + |
| 118 | +5. **Run target** |
| 119 | + |
| 120 | + * `runpy.run_path(path, run_name="__main__")`. |
| 121 | + * On exception, format traceback and record as a structured event. |
| 122 | + |
| 123 | +6. **Teardown** |
| 124 | + |
| 125 | + * `set_events(tool_id, 0)`, `register_callback(tool_id, LINE, None)`. |
| 126 | + * Close `in_w` if not already. |
| 127 | + * Drain remaining output; restore original FDs via `dup2(saved, fd)`; close all pipe FDs. |
| 128 | + |
| 129 | +7. **Merge & persist** |
| 130 | + |
| 131 | + * Merge LINE events and output chunks by timestamp (per-thread if available). |
| 132 | + * Persist a session artifact: |
| 133 | + |
| 134 | + ```json |
| 135 | + { |
| 136 | + "env": {...}, |
| 137 | + "events": [ { "ts":..., "thread":..., "file":..., "func":..., "line":... } ], |
| 138 | + "io": [ |
| 139 | + { "ts0":..., "ts1":..., "stream":"stdout", "thread":..., "bytes":"base64..." }, |
| 140 | + { "ts0":..., "ts1":..., "stream":"stderr", ... } |
| 141 | + ], |
| 142 | + "stdin": [ ... ] |
| 143 | + } |
| 144 | + ``` |
| 145 | + |
| 146 | +### B. Monitoring callback (Python or Rust) |
| 147 | + |
| 148 | +* **Python shim** (simple): callback appends tuples to a Python list; Rust extracts at end. |
| 149 | +* **Rust callback** (preferred for perf): expose `#[pyfunction]` that pushes to a lock-free ring buffer while holding the GIL briefly. |
| 150 | + |
| 151 | +**Event fields**: `ts = perf_counter() (or Rust monotonic)`, `thread_ident = threading.get_ident()`, `code.co_filename`, `code.co_name`, `line_no`. |
| 152 | + |
| 153 | +### C. Mapping algorithm |
| 154 | + |
| 155 | +* Maintain `last_line_by_thread` updated on each LINE event. |
| 156 | +* When an output chunk arrives: |
| 157 | + |
| 158 | + * Assign `thread = current “owner”` if known; else leave `null` and fall back to global last line. |
| 159 | + * Attribute to `last_line_by_thread[thread]` (or global last line). |
| 160 | +* For fine disambiguation on the same source line: |
| 161 | + |
| 162 | + * Optionally enable **CALL/C_RETURN** and attribute chunks occurring between them to the active call frame (e.g., `print` invocations). |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## Implementation Notes (Unix/Windows) |
| 167 | + |
| 168 | +* **Unix**: use `nix` crate for `pipe/dup/dup2/read/write/close`. |
| 169 | +* **Windows**: `CreatePipe`, `SetStdHandle`, `GetStdHandle` to save originals; use `ReadFile`/`WriteFile` in reader/feeder threads. |
| 170 | +* **TTY needs**: use a **PTY** (`openpty`/`forkpty` or `winpty/ConPTY`) to provide terminal behavior (echo, line editing). |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## Telemetry & Logging |
| 175 | + |
| 176 | +* Initialize a custom logger writing to the **saved original stderr FD** or to a **file/syslog** before enabling redirection. |
| 177 | +* Include a **session header** (versions, timestamps, script path, argv) in the persisted artifact. |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Testing Plan |
| 182 | + |
| 183 | +1. **Unit** |
| 184 | + |
| 185 | + * Pure-Python prints (single/multi-line, with/without flush). |
| 186 | + * C-level prints (e.g., `ctypes` calling `puts`, NumPy warning to stderr). |
| 187 | + * Exceptions (traceback captured and not lost in pipes). |
| 188 | + |
| 189 | +2. **Concurrency** |
| 190 | + |
| 191 | + * Multiple threads printing interleaved; verify mapping is stable and no deadlocks (drainers running). |
| 192 | + * `asyncio` tasks printing; verify sequence is coherent. |
| 193 | + |
| 194 | +3. **Input** |
| 195 | + |
| 196 | + * `input()` / `sys.stdin.read()` with scripted stdin; ensure EOF ends read. |
| 197 | + * Passthrough mode; ensure tee log matches bytes fed. |
| 198 | + |
| 199 | +4. **Behavioral** |
| 200 | + |
| 201 | + * Relative imports from script dir work (sys.path[0]/cwd). |
| 202 | + * Large outputs (≥ pipe buffer) do not deadlock; throughput OK. |
| 203 | + |
| 204 | +5. **Windows** |
| 205 | + |
| 206 | + * Equivalent redirection and restore; CRLF handling; code page sanity. |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## Rollout |
| 211 | + |
| 212 | +* Implement behind a feature flag (`runner_fd_capture`). |
| 213 | +* Ship a CLI subcommand to run a script with capture for manual validation. |
| 214 | +* Gate by runtime check: Python ≥ 3.12. |
| 215 | +* Add an integration test matrix (Linux/macOS/Windows). |
| 216 | + |
| 217 | +--- |
| 218 | + |
| 219 | +## Open Questions / Future Work |
| 220 | + |
| 221 | +* Add optional **INSTRUCTION** events for ultra-fine mapping when needed. |
| 222 | +* Detect and label **subprocess** outputs (inherit our FDs? PTY? wrappers). |
| 223 | +* Expose a **live tee** to developer console while still recording (mirror to saved original fds). |
| 224 | +* Structured replay API (seek by time/line/thread; fold/expand calls). |
| 225 | +* Consider **no-GIL** Python in future: ensure event buffers are thread-safe without relying on GIL serialization. |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +## Acceptance Criteria |
| 230 | + |
| 231 | +* Captures stdout/stderr/stdin non-invasively; no contamination from recorder logs. |
| 232 | +* Produces a stable mapping from output chunks to source lines for typical code, threads, and asyncio. |
| 233 | +* Equivalent semantics to `runpy.run_path(..., run_name="__main__")`. |
| 234 | +* Clean startup/teardown with restored FDs; no deadlocks/leaks on large I/O. |
| 235 | +* Cross-platform (Linux/macOS; Windows parity planned or implemented per milestone). |
0 commit comments