|
| 1 | +# Capture Output Implementation Plan |
| 2 | + |
| 3 | +## Goal |
| 4 | +- Ship lossless stdout, stderr, and stdin capture in the Rust recorder without breaking the current CLI flow or error policy. |
| 5 | + |
| 6 | +## Guiding Notes |
| 7 | +- Follow ADR 0005. |
| 8 | +- Keep sentences short for readers; prefer bullets. |
| 9 | +- Run `just test` on every stage. |
| 10 | + |
| 11 | +## Stage 0 – Refactor for IO capture (must land first) |
| 12 | +- Split writer ownership out of `RuntimeTracer` into a helper (`TraceWriterHost`) that exposes a thread-safe event API. |
| 13 | +- Add a small `ThreadSnapshotStore` that records the latest `{path_id, line, frame_id}` per Python thread inside the runtime module. |
| 14 | +- Ensure `RuntimeTracer::finish` already waits on background work hooks; add a stub `IoDrain` trait with no-op implementation so later stages can slot in real drains. |
| 15 | +- Update `session::start_tracing` and `stop_tracing` to accept optional "extra lifecycle" handles so we can pair start/stop work without more globals. |
| 16 | +- Tests: extend existing runtime unit tests to cover the new snapshot store and confirm start/stop paths still finalise trace files. |
| 17 | + |
| 18 | +## Stage 1 – Build the IO capture core |
| 19 | +- Create `runtime::io_capture` with platform-specific back ends (`unix.rs`, `windows.rs`) hidden behind a common trait. |
| 20 | +- Implement descriptor/handle duplication, pipe install, and reader thread startup. Use blocking reads and thread-safe queues (`crossbeam-channel` already in workspace; add if missing). |
| 21 | +- Ensure mirror writes go back to the saved descriptors so console output stays live. |
| 22 | +- Tests: add Rust unit tests that fake pipes (use `os_pipe` on Unix, `tempfile` handles on Windows via CI) to confirm duplication and restoration. |
| 23 | + |
| 24 | +## Stage 2 – Connect capture to the tracer |
| 25 | +- Add an `IoEventSink` struct that owns `Arc<Mutex<TraceWriterHost>>` plus a snapshot reader. |
| 26 | +- Reader threads push `IoChunk` structs (`stream`, `timestamp`, `bytes`, `producer_thread`) into the sink. The sink converts them into runtime tracing events and records them. |
| 27 | +- Use `recorder-errors` for all failures (`usage!` for bad config, `enverr!` for IO problems). Log through the existing logging module; never `println!`. |
| 28 | +- Update `RuntimeTracer::begin` to start the sink when policy allows. Store the `IoCapture` handle and drain it in `finish`. |
| 29 | +- Tests: add integration tests in `tests/` that run a sample script writing to stdout/stderr and reading from stdin, then assert trace files contain the matching events. Verify passthrough stays intact. |
| 30 | + |
| 31 | +## Stage 3 – Policy flag, CLI wiring, and guards |
| 32 | +- Extend `RecorderPolicy` with `io_capture_enabled` plus env var `CODETRACER_CAPTURE_IO`. |
| 33 | +- Make the Python CLI surface a `--capture-io` flag (defaults to policy). Document the flag in help text. |
| 34 | +- Emit a single log line when capture is disabled by policy so users understand why their trace lacks IO events. |
| 35 | +- Tests: Python integration test toggling the policy and checking presence/absence of IO records. |
| 36 | + |
| 37 | +## Stage 4 – Hardening and docs |
| 38 | +- Stress test with large outputs (beyond pipe buffer) and interleaved writes from multiple threads. |
| 39 | +- Run Windows CI to verify handle restore logic and CRLF behaviour. |
| 40 | +- Document the feature in README + design docs. Update ADR status once accepted. |
| 41 | +- Add metrics for dropped IO chunks using the existing logging counters. |
| 42 | +- Tests: extend stress tests plus regression tests for start/stop loops to ensure descriptors always restore. |
| 43 | + |
| 44 | +## Milestones |
| 45 | +1. Stage 0 merged and green CI. Serves as base branch for feature work. |
| 46 | +2. Stages 1–2 merged together behind a feature flag. Feature hidden by default. |
| 47 | +3. Stage 3 flips the flag for opted-in users. Gather feedback. |
| 48 | +4. Stage 4 finishes docs, flips default to on, and promotes ADR 0005 to Accepted. |
| 49 | + |
| 50 | +## Verification Checklist |
| 51 | +- `just test` passes after every stage. |
| 52 | +- New unit tests cover writer host, snapshot store, and IO capture workers. |
| 53 | +- Integration tests assert trace events and passthrough behaviour on Linux and Windows. |
| 54 | +- Manual smoke: run `python -m codetracer_python_recorder examples/stdout_script.py` and confirm console output plus IO trace entries. |
| 55 | + |
| 56 | +## Risks & Mitigations |
| 57 | +- **Deadlocks:** Keep reader threads simple, use bounded channels, and add shutdown timeouts tested in CI. |
| 58 | +- **Performance hit:** Benchmark before and after Stage 2 with large stdout workloads; document results. |
| 59 | +- **Platform drift:** Share the Unix/Windows API contract in a `README` inside the module and guard behaviour with tests. |
| 60 | + |
| 61 | +## Exit Criteria |
| 62 | +- IO events present in trace files when the policy flag is on. |
| 63 | +- Console output unchanged for users. |
| 64 | +- No file descriptor leaks (checked via stress tests and `lsof` in CI scripts). |
| 65 | +- Documentation published and linked from ADR 0005. |
0 commit comments