Skip to content

Commit 0c5bcc2

Browse files
committed
docs: capture error-handling review stories
1 parent 16e2f63 commit 0c5bcc2

File tree

2 files changed

+66
-0
lines changed

2 files changed

+66
-0
lines changed

.agents/tasks/2025/08/21-0939-codetype-interface

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,6 @@ Implement the CodeObjectWrapper as designed. Update the Tracer trait as well as
4040
There is an issue in the current implementation. We don't use caching effectively, since we create a new CodeObjectWrapper at each callback_xxx call. We need a global cache, probably keyed by the code object id. Propose design changes and update the design documents. Don't implement the changes themselves before I approve them.
4141
--- FOLLOW UP TASK ---
4242
Implement the global code object registry.
43+
44+
--- FOLLOW UP TASK ---
45+
In this branch we have implemented standardized error-handling and logging. But we didn't get those features approved by the PM. Look at the changes, also at the design documents for the error handling and write down a set of user stories to give to the PM
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Recorder Error Handling & Logging — User Stories
2+
3+
## Context
4+
The current branch introduces the structured error-handling stack and logging upgrades outlined in the error-handling implementation plan. The stories below package those capabilities for product review.
5+
6+
## User Stories
7+
8+
### 1. Python clients receive structured recorder failures
9+
**As** a Python integrator embedding the recorder
10+
**I want** every failure to surface as a `RecorderError` (or subclass) with stable `ERR_*` codes and context metadata
11+
**So that** my tooling can branch on error kind without parsing ad-hoc strings
12+
13+
**Acceptance criteria**
14+
- Exceptions raised by the Rust extension map to `RecorderError`, `UsageError`, `EnvironmentError`, `TargetError`, or `InternalError`, each exposing `code`, `kind`, and `context` attributes populated from `RecorderError` fields.
15+
- Panic conditions across the FFI boundary are caught and reclassified as `InternalError` with a distinct error code.
16+
17+
### 2. Embedders configure recorder policy centrally
18+
**As** an application embedding CodeTracer
19+
**I want** to configure runtime policy (abort vs disable, require traces, partial-trace retention, JSON errors) via Python APIs or environment variables before starting a session
20+
**So that** the recorder’s shutdown behaviour matches my product’s UX and operational constraints
21+
22+
**Acceptance criteria**
23+
- `configure_policy` accepts keyword arguments (e.g., `on_recorder_error`, `require_trace`, `keep_partial_trace`, `log_level`, `log_file`, `json_errors`) and updates the global policy snapshot consumed by the Rust runtime.
24+
- `configure_policy_from_env` reads the `CODETRACER_*` environment variables and applies the same policy wiring automatically when packages import the module or the CLI starts.
25+
- `TraceSession.start()` (and the CLI wrapper) refreshes policy from env, then forwards explicit overrides before activating tracing.
26+
27+
### 3. CLI operators can steer policy from the command line
28+
**As** a CLI user running `python -m codetracer_python_recorder`
29+
**I want** recorder policy toggles exposed as command-line flags alongside trace path/format options
30+
**So that** I can experiment with abort vs disable flows, JSON trailers, and log destinations without writing glue code
31+
32+
**Acceptance criteria**
33+
- The CLI accepts `--codetracer-on-recorder-error`, `--codetracer-require-trace`, `--codetracer-keep-partial-trace`, `--codetracer-log-level`, `--codetracer-log-file`, and `--codetracer-json-errors`, wiring them through `configure_policy`.
34+
- When no explicit flags are provided, the CLI still honours policy derived from environment variables via `configure_policy_from_env`.
35+
36+
### 4. Structured diagnostics feed observability pipelines
37+
**As** an observability engineer consuming recorder telemetry
38+
**I want** recorder logs to emit structured JSON that includes a stable `run_id`, optional `trace_id`, log level, and any active error code
39+
**So that** downstream collectors can correlate recorder failures with host application behaviour
40+
41+
**Acceptance criteria**
42+
- Importing or starting tracing initialises the Rust logger once, generating JSON log lines with `run_id`, optional `trace_id`, `message`, and `error_code` fields.
43+
- `with_error_code` scoping ensures error logs include the originating `ERR_*` value, and `set_active_trace_id` updates subsequent log entries during a trace.
44+
- When `RecorderPolicy.log_file` is set, log output is redirected to the configured file; otherwise entries fall back to stderr with best-effort recovery on IO failures.
45+
46+
### 5. Automation can parse machine-readable error trailers
47+
**As** a workflow owner triaging recorder failures
48+
**I want** an opt-in JSON trailer on stderr describing each surfaced `RecorderError`
49+
**So that** automated tooling can react to failure codes without scraping human text
50+
51+
**Acceptance criteria**
52+
- Enabling the `json_errors` policy flag causes the FFI mapper to emit a JSON line with `run_id`, `trace_id`, `error_code`, `error_kind`, `message`, and `context` whenever a `RecorderError` crosses into Python.
53+
- Trailer emission respects the configured writer (stderr by default, test hook override for automated verification) and flushes after each payload.
54+
55+
### 6. Metrics capture detachments and dropped events
56+
**As** the recorder team monitoring runtime health
57+
**I want** lightweight counters whenever tracing detaches, events are dropped, or panics are caught
58+
**So that** we can detect regressions in sampling coverage and panic containment
59+
60+
**Acceptance criteria**
61+
- A pluggable `RecorderMetrics` sink tracks `record_dropped_event`, `record_detach`, and `record_panic` calls, defaulting to a no-op until hosts install a collector.
62+
- Runtime code invokes the metrics hooks when synthetic filenames are skipped, when policy-triggered detachments occur, and when the FFI wrapper captures a panic.
63+

0 commit comments

Comments
 (0)