WS8 - Docs

tzanko-matev · tzanko-matev · commit aa81052a09ab · 2025-10-03T15:35:05.000+03:00
diff --git a/README.md b/README.md
@@ -2,11 +2,68 @@
 
 This repository now hosts two related projects:
 
-- codetracer-pure-python-recorder — the existing pure-Python prototype that records [CodeTracer](https://github.com/metacraft-labs/CodeTracer) traces using sys.settrace.
-- codetracer-python-recorder — a new, Rust-backed Python extension module (PyO3) intended to provide a faster and more featureful recorder.
+- codetracer-pure-python-recorder — a pure-Python tracer that still mirrors the early prototype.
+- codetracer-python-recorder — a Rust-backed Python extension (PyO3 + maturin) with structured errors and tighter tooling.
 
-> [!WARNING]
-> Both projects are early-stage prototypes. Contributions and discussion are welcome!
+Both projects are still in motion. Expect breaking changes while we finish the error-handling rollout.
+
+### Structured errors (Rust-backed recorder)
+
+The Rust module wraps every failure in a `RecorderError` hierarchy that reaches Python with a stable `code`, a readable `kind`, and a `context` dict.
+
+- `UsageError` → bad input or calling pattern. Codes like `ERR_ALREADY_TRACING`.
+- `EnvironmentError` → IO or OS problems. Codes like `ERR_IO`.
+- `TargetError` → the traced program raised or refused inspection. Codes like `ERR_TRACE_INCOMPLETE`.
+- `InternalError` → a recorder bug or panic. Codes default to `ERR_UNKNOWN` unless classified.
+
+Quick catch example:
+
+```python
+from codetracer_python_recorder import RecorderError, start, stop
+
+try:
+    session = start("/tmp/trace", format="json")
+except RecorderError as err:
+    print(f"Recorder failed: {err.code}")
+    for key, value in err.context.items():
+        print(f"  {key}: {value}")
+else:
+    try:
+        ...  # run work here
+    finally:
+        session.flush()
+        stop()
+```
+
+All subclasses carry the same attributes, so existing handlers can migrate by catching `RecorderError` once and branching on `err.code` if needed.
+
+### CLI exit behaviour and JSON trailers
+
+`python -m codetracer_python_recorder` returns:
+
+- `0` when tracing and the target script succeed.
+- The script's own exit code when it calls `sys.exit()`.
+- `1` when a `RecorderError` bubbles out of startup or shutdown.
+- `2` when the CLI arguments are incomplete.
+
+Pass `--codetracer-json-errors` (or set the policy via `configure_policy(json_errors=True)`) to stream a one-line JSON trailer on stderr. The payload includes `run_id`, `trace_id`, `error_code`, `error_kind`, `message`, and the `context` map so downstream tooling can log failures without scraping text.
+
+### Migration checklist for downstream tools
+
+- Catch `RecorderError` (or a subclass) instead of `RuntimeError`.
+- Switch any string matching over to `err.code` values like `ERR_TRACE_DIR_CONFLICT`.
+- Expect structured log lines (JSON) on stderr. Use the `error_code` field instead of parsing text.
+- Opt in to JSON trailers for machine parsing and keep human output short.
+- Update policy wiring to use `configure_policy` / `policy_snapshot()` rather than hand-rolled env parsing.
+- Read `docs/onboarding/error-handling.md` for detailed migration steps and assertion rules.
+
+### Logging defaults
+
+The recorder now installs a JSON logger on first import. Logs include `run_id`, optional `trace_id`, and an `error_code` field when set.
+
+- Control the log filter with `RUST_LOG=target=level` (standard env syntax).
+- Override from Python with `configure_policy(log_level="info")` or `log_file=...` for file output.
+- Metrics counters record dropped events, detach reasons, and caught panics; plug your own sink via the Rust API when embedding.
 
 ### codetracer-pure-python-recorder
 
@@ -64,7 +121,7 @@ RUST_LOG=codetracer_python_recorder=debug pytest \
   codetracer-python-recorder/tests/python/unit/test_backend_exceptions.py -q
 ```
 
-Any filter accepted by `env_logger` works, so you can switch to
+Any filter accepted by `env_logger` still works, so you can switch to
 `RUST_LOG=codetracer_python_recorder=info` or silence everything with
 `RUST_LOG=off`.
 
diff --git a/codetracer-python-recorder/CHANGELOG.md b/codetracer-python-recorder/CHANGELOG.md
@@ -0,0 +1,6 @@
+# codetracer-python-recorder — Change Log
+
+## Unreleased
+- Documented the error-handling policy. README now lists the `RecorderError` hierarchy, policy hooks (`configure_policy`, JSON trailers), exit codes, and sample handlers so Python callers can consume structured failures.
+- Added an onboarding guide under `docs/onboarding/error-handling.md` with migration steps for downstream tools.
+- Recorded assertion guidance for contributors: prefer `bug!`/`ensure_internal!` over raw `panic!`/`.unwrap()` and keep `debug_assert!` paired with classified errors.
diff --git a/codetracer-python-recorder/codetracer_python_recorder/__init__.py b/codetracer-python-recorder/codetracer_python_recorder/__init__.py
@@ -1,10 +1,10 @@
-"""High-level tracing API built on a Rust backend.
+"""Public tracing surface with structured recorder errors.
 
-This module exposes a minimal interface for starting and stopping
-runtime traces. The heavy lifting is delegated to the
-`codetracer_python_recorder` Rust extension which will eventually hook
-into `runtime_tracing` and `sys.monitoring`.  For now the Rust side only
-maintains placeholder state and performs no actual tracing.
+Importing this package installs policy defaults, wires the Rust backend,
+and exposes helpers to start and stop tracing. Every failure travels
+through :class:`RecorderError` or one of its subclasses. Each exception
+carries a stable ``code`` string (``ERR_*``), a ``kind`` label, and a
+``context`` dict for tooling.
 """
 
 from . import api as _api
diff --git a/codetracer-python-recorder/codetracer_python_recorder/api.py b/codetracer-python-recorder/codetracer_python_recorder/api.py
@@ -1,4 +1,9 @@
-"""High-level tracing API built on a Rust backend."""
+"""High-level tracing helpers with structured error propagation.
+
+Expose the core session helpers (:func:`start`, :func:`stop`,
+:func:`trace`, etc.). These wrappers bubble up :class:`RecorderError`
+instances from the Rust layer so callers see stable ``ERR_*`` codes.
+"""
 from __future__ import annotations
 
 from typing import Iterable
diff --git a/codetracer-python-recorder/codetracer_python_recorder/session.py b/codetracer-python-recorder/codetracer_python_recorder/session.py
@@ -1,4 +1,8 @@
-"""Tracing session management helpers."""
+"""Tracing session management helpers with policy integration.
+
+These wrappers load policy from env vars, call into the Rust backend,
+and surface structured :class:`RecorderError` instances on failure.
+"""
 from __future__ import annotations
 
 import contextlib
@@ -20,7 +24,11 @@
 
 
 class TraceSession:
-    """Handle representing a live tracing session."""
+    """Handle representing a live tracing session.
+
+    The object keeps the resolved trace path and format. Use
+    :meth:`flush` and :meth:`stop` to interact with the global session.
+    """
 
     path: Path
     format: str
@@ -72,6 +80,20 @@ def start(
         When ``True`` (default), refresh policy settings from environment
         variables via :func:`configure_policy_from_env` prior to applying
         explicit overrides.
+
+    Returns
+    -------
+    TraceSession
+        Handle for the active recorder session.
+
+    Raises
+    ------
+    RecorderError
+        Raised by the Rust backend when configuration, IO, or the target
+        script fails.
+    RuntimeError
+        Raised when ``start`` is called while another session is still
+        active. The guard lives in Python so the error stays synchronous.
     """
     global _active_session
     if _is_tracing_backend():
diff --git a/design-docs/error-handling-implementation-plan.status.md b/design-docs/error-handling-implementation-plan.status.md
@@ -1,6 +1,6 @@
 # Error Handling Implementation Plan — Status
 
-_Last updated: 2025-10-04_
+_Last updated: 2025-10-05_
 
 ## WS1 – Foundations & Inventory
 State: In progress
@@ -62,5 +62,15 @@ Highlights:
 - Implemented `just lint` orchestration running `cargo clippy -D clippy::panic` and a repository script that blocks unchecked `.unwrap(` usage outside the legacy allowlist.
 Next moves: Monitor unwrap allowlist shrinkage once WS1 follow-ups land; evaluate extending the lint to `.expect(` once monitoring refactor closes.
 
+## WS8 – Documentation & Rollout
+State: Done (2025-10-05)
+Highlights:
+- README now covers the recorder error policy, JSON trailers, exit codes, and a short Python `RecorderError` catch example.
+- Added `docs/onboarding/error-handling.md` with migration steps, policy wiring tips, and assertion rules for contributors.
+- Started `codetracer-python-recorder/CHANGELOG.md` to brief downstream tools on consuming structured errors.
+Next moves:
+- Share the onboarding doc with downstream maintainers and collect gaps before promoting ADR 0004 to **Accepted**.
+- Fold feedback into the change log before the next release tag.
+
 ## Upcoming Workstreams
-WS8 – Documentation & Rollout: Not started. Pending guidance from Docs WG and ADR promotion once downstream consumers validate the new error interfaces.
+- None. Hold for ADR 0004 promotion once downstream validation wraps up.
diff --git a/docs/onboarding/error-handling.md b/docs/onboarding/error-handling.md
@@ -0,0 +1,58 @@
+# Recorder Error Handling Onboarding
+
+This note aligns new contributors and downstream consumers on the structured error work. Keep it close when you wire the recorder into tools or review patches that touch failure paths.
+
+## Error classes at a glance
+- `RecorderError` is the base class. Subclasses are `UsageError`, `EnvironmentError`, `TargetError`, and `InternalError`.
+- Every instance exposes `code` (an `ERR_*` string), `kind` (matches the class), and a `context` dict with string keys.
+- Codes stay stable. Add new codes instead of recycling strings.
+- The Rust layer also attaches a source error when possible; Python reprs show it as `caused by ...`.
+
+## Python API quick start
+```python
+from codetracer_python_recorder import RecorderError, TargetError, start, stop
+
+try:
+    session = start("/tmp/trace", format="json")
+except RecorderError as err:
+    print(f"Recorder failed: {err.code}")
+    for key, value in err.context.items():
+        print(f"  {key}: {value}")
+else:
+    try:
+        ...  # run traced work here
+    finally:
+        session.flush()
+        stop()
+```
+- Catch `RecorderError` when you want a single guard. Catch subclasses when you care about `UsageError` vs `TargetError`.
+- Calling `start` twice raises `RuntimeError` from a thin Python guard. Everything after the guard uses `RecorderError`.
+
+## CLI workflow and JSON trailers
+- Run `python -m codetracer_python_recorder --codetracer-format=json app.py` to trace a script.
+- Exit codes: `0` for success, script exit code when the script stops itself, `1` when a `RecorderError` escapes startup/shutdown, `2` on CLI misuse.
+- Pass `--codetracer-json-errors` (or `configure_policy(json_errors=True)`) to mirror each failure as a one-line JSON object on stderr.
+- JSON fields: `run_id`, optional `trace_id`, `error_code`, `error_kind`, `message`, `context`.
+
+## Migration checklist for existing clients
+1. Replace `RuntimeError` / string matching with `RecorderError` + `err.code` checks.
+2. Forward policy options through `configure_policy` (or `policy_snapshot`) instead of reinventing env parsing.
+3. Expect structured log lines on stderr. Parse JSON and read the `error_code` field.
+4. Opt in to JSON trailers when you need machine-readable failure signals.
+5. Keep CLI wrappers short. Avoid reformatting the recorder message; attach extra context alongside it.
+
+## Assertion rules for recorder code
+- Use `ensure_usage!`, `ensure_env!`, or `ensure_internal!` when translating invariants into classified failures.
+- Reach for `bug!` when you hit a state that should never happen in production.
+- Reserve `assert!` and `debug_assert!` for tests or temporary invariants. If you need a dev-only guard, combine `debug_assert!` with the matching `ensure_*` call so production still fails cleanly.
+- Never reintroduce `.unwrap()` inside the recorder crate without extending the allowlist. Use the macros instead.
+
+## Tooling guardrails
+- Run `just lint` before sending a patch. It runs Clippy with `-D clippy::panic` and our unwrap scanner.
+- Run `just test` to exercise Rust (nextest) and Python suites. Failure injections cover permission errors, target crashes, and panic paths.
+- Enable the `integration-test` cargo feature when you add new Python surface tests so the Rust hooks are active.
+- When in doubt, add a regression test alongside the docs. The plan treats docs plus tests as the definition of done.
+
+## Need help?
+- Check `design-docs/error-handling-implementation-plan.md` for context and open questions.
+- Ping the error-handling working thread if a new code or policy toggle seems missing. The goal is to keep `RecorderError` exhaustive, not to fork ad hoc enums in downstream tools.