You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(AI-generated spec based on the contents of this PR)
# Tracing Function Arguments on Entry and Structured Value Encoding
This specification defines how to capture Python function arguments at
the moment a function starts executing (the PY_START event) and how to
encode argument values into the runtime tracing format. It also defines
fail‑fast error behavior for the monitoring callback and the test
expectations that validate the behavior.
Audience: Junior developers familiar with Rust and Python, but with no
prior knowledge of CPython frames or this codebase.
## Executive Summary
- Record function arguments on PY_START for all Python parameter kinds:
positional‑only, positional‑or‑keyword, keyword‑only, varargs (`*args`),
and kwargs (`**kwargs`).
- Encode values canonically and structurally:
- `None`, `bool`, `int`, `str` as dedicated kinds (`None`, `Bool`,
`Int`, `String`).
- Python `tuple` → `Tuple` with recursively encoded elements.
- Python `list` → `Sequence` with recursively encoded elements.
- Python `dict` → `Sequence` of `(key, value)` `Tuple`s. Keys are
encoded as `String` when possible; otherwise, encode the key normally.
- Fail fast on irrecoverable errors during argument capture: raise a
Python exception and immediately disable further monitoring callbacks
for the session.
- Tests assert argument presence, name mapping, stable string encoding,
and structured kwargs.
- Add `.cargo/` to version control ignore rules.
## Goals and Non‑Goals
Goals
- Capture and emit all Python argument kinds on function entry.
- Preserve structure of varargs and kwargs values where possible.
- Provide deterministic, canonical encoding for common primitives.
- Fail fast on errors (no silent fallbacks) and disable further
monitoring after the first callback error.
- Provide clear, verifiable test criteria.
Non‑Goals
- Introducing a new mapping kind to the value schema (we reuse existing
`Sequence` + `Tuple`).
- Changing higher‑level tracing schemas or writer behavior beyond what
is needed to attach arguments to `Call` events.
- Unifying cross‑recorder type naming (e.g., “List” vs “Array”) beyond
the choices specified here.
## Background: CPython Frames and Code Objects (Quick Primer)
At the beginning of a Python function call, CPython creates a frame with
locals bound for the call. The function’s code object carries metadata
describing its parameters.
Key code object attributes used here (CPython 3.8+):
- `co_varnames`: A tuple of local variable names. Parameters appear
first in a defined order.
- `co_argcount`: Total count of positional parameters. Important: in
Python 3.8+, this total includes positional‑only and
positional‑or‑keyword parameters (see PEP 570: Positional‑Only
Parameters).
- `co_posonlyargcount`: Count of positional‑only parameters. Useful only
if you need to distinguish subgroups; we do not for this feature.
- `co_kwonlyargcount`: Count of keyword‑only parameters.
- `co_flags`: Bitmask; `0x04` indicates presence of `*args` (varargs),
`0x08` indicates presence of `**kwargs` (varkeywords).
Reference terms: PEP 570 (Positional‑Only Parameters) and CPython code
object docs.
## High‑Level Design
When the monitoring system delivers a PY_START event, we:
1. Ensure the tracer is started for the code object and obtain a
function id.
2. Obtain the current frame via `sys._getframe(0)` and the frame’s
locals (`f_locals`).
3. Compute the ordered list of parameter names directly from the code
object, using CPython ordering, and look up each name in `f_locals`.
4. Encode each found value using `encode_value` and attach the resulting
`args` vector to the `Call` event payload via the trace writer.
5. If any irrecoverable error occurs (e.g., `_getframe` unavailable),
raise a Python exception and immediately disable further monitoring
(fail fast).
## Parameter Ordering and Name Discovery
Given a bound code object and Python 3.8+ semantics:
- Let `pos_count = co_argcount` (total positional parameters, including
positional‑only and positional‑or‑keyword). Do not add
`co_posonlyargcount` to this figure (that would double count).
- Let `kwonly_count = co_kwonlyargcount`.
- Let `flags = co_flags`.
- Let `varnames = list(co_varnames)`.
Derive the ordered parameter names from `varnames`:
- Positional parameters: `varnames[0 : min(pos_count, len(varnames))]`.
- Varargs (`*args`): if `flags & 0x04 != 0`, then next name is
`varnames[idx]`.
- Keyword‑only parameters: the next `kwonly_count` names.
- Kwargs (`**kwargs`): if `flags & 0x08 != 0`, then next name is
`varnames[idx]`.
For each name in this sequence, try to fetch the value from
`f_locals[name]`:
- If present, encode it and include it.
- If absent or retrieval fails, skip it silently (locals may not have
been populated for some names in unusual interpreter states, but this
should be rare at function entry).
## Value Encoding Rules (`encode_value`)
Encode a Python object to a `ValueRecord` used by the trace writer. The
encoder must be recursive and must follow these canonical rules:
Primitives and None
- `None` → special `NONE_VALUE` constant.
- `bool` → `Bool` with appropriate `type_id`.
- `int` → `Int` with appropriate `type_id`.
- `str` → `String` with exact text. This is canonical for text; do not
fall back to `Raw` for `str`.
Containers
- Python `tuple` → `Tuple` with `elements = [encode_value(item) for item
in tuple]`.
- Python `list` → `Sequence` with `elements = [encode_value(item) for
item in list]`, `is_slice = false`, and language type name “List”.
- Python `dict` → represent as a `Sequence` with language type name
“Dict”, whose elements are 2‑element `Tuple`s `(key, value)`.
- Encode keys as `String` when `key` is a Python `str`.
- If a key is not a `str`, encode the key using normal rules (best
effort). Kwarg keys are always strings, so in kwargs contexts you will
observe `String` keys.
Fallback
- For all other types, obtain a textual representation and encode as
`Raw` with language type name “Object”.
Type registration
- For every concrete kind you emit, register or look up a `type_id` via
`TraceWriter::ensure_type_id(...)`, using the following language type
names:
- `Bool` → "Bool"
- `Int` → "Int"
- `String` → "String"
- `Tuple` → "Tuple"
- `Sequence` (Python list) → "List"
- `Sequence` (Python dict encoded as sequence of pairs) → "Dict"
- `Raw` → "Object"
## Attaching Arguments to the Call Event
For each discovered parameter name and encoded value:
- Create a full value record using `TraceWriter::arg(writer, name,
value_record)`.
- Accumulate these into a `Vec<FullValueRecord>`.
- Emit the `Call` event via `TraceWriter::register_call(writer,
function_id, args_vec)`.
Note: The writer manages a variable‑name table. Each argument will
reference a `variable_id` that can be resolved to the actual name
through separate `VariableName` events.
## Error Handling and Fail‑Fast Behavior
`on_py_start` must return `PyResult<()>` instead of `()`. Behavior:
- On success: return `Ok(())`.
- On irrecoverable error (e.g., `_getframe` import or call fails,
accessing locals fails in a way that prevents capture):
- Return `Err(PyRuntimeError("on_py_start: failed to capture args:
<reason>"))`.
- The callback wrapper (see below) must immediately disable future
monitoring for this tool by setting events to `NO_EVENTS` and propagate
the error to Python.
Callback wrapper behavior (PY_START only is specified, but approach
generalizes):
- Acquire the global tracer context.
- Invoke `on_py_start` and match on the `PyResult`.
- `Ok(())`: return `Ok(())`.
- `Err(err)`: call `set_events(py, &tool, NO_EVENTS)` to turn off events
for this session, log an error, and return `Err(err)`.
- If the global context is absent, return `Ok(())` (no tracing active).
Rationale: Turning off events on first error prevents repeated
exceptions during interpreter activities like error printing (which
otherwise trigger more PY_START events).
## Test Specifications
Parsing helper changes (Python side)
- Extend the trace parsing helper to collect:
- `varnames: List[str]` from `VariableName` events (index is
`variable_id`).
- `call_records: List[Dict[str, Any]]` from raw `Call` payloads (to
inspect args).
Test: record positional arguments on entry
- Create a script:
- `def foo(a, b): return a if len(str(b)) > 0 else 0`
- Call `foo(1, 'x')` under tracing.
- Assert:
- A `Call` for `foo` exists with two arguments.
- Arg 0: name `a`, value kind `Int`, value `1`.
- Arg 1: name `b`, value kind `String`, text `"x"`.
Test: record all Python argument kinds
- Create a script:
- `def g(p, /, q, *args, r, **kwargs): ...`
- Call `g(10, 20, 30, 40, r=50, k=60)` under tracing.
- Assert:
- Names present: `p`, `q`, `args`, `r`, `kwargs`.
- `p == 10`, `q == 20`, `r == 50` as `Int`.
- Varargs (`args`) is either:
- `Sequence` or `Tuple` with exactly two elements `30`, `40` as `Int`,
or
- `Raw` whose text contains `"30"` and `"40"` (accepted to keep
compatibility with alternative backends).
- Kwargs (`kwargs`) is structured as:
- kind `Sequence` with one element, which is
- kind `Tuple` of two elements: key record kind `String` with text
`"k"`; value record kind `Int` with `60`.
Test: fail fast when frame access fails (Rust module test via PyO3)
- Start tracing with activation scoped to the test program path.
- Monkeypatch `sys._getframe` to raise `RuntimeError` when called.
- Execute a trivial program that triggers a Python function call under
tracing.
- Expect a raised exception containing `_getframe` info.
- Execute the program again in the same process: no exception should be
raised because monitoring has been disabled.
- Restore `_getframe` and stop tracing.
Rust test fixture adaptation
- Any `Tracer` implementations used by tests must update `on_py_start`
signature to return `PyResult<()>` and return `Ok(())` when no special
logic is needed.
## Implementation Details (Where and How)
Files and responsibilities
- `src/runtime_tracer.rs`
- Implement/extend `encode_value(py, value)` per the rules above, using
`TraceWriter::ensure_type_id(...)` for type registration.
- Change `on_py_start(py, code, offset)` to return `PyResult<()>` and
implement argument capture:
- Ensure tracer started and `function_id` available.
- Build ordered parameter list from the code object (`co_varnames`,
`co_argcount`, `co_kwonlyargcount`, `co_flags`). Do not double count
positional‑only.
- Obtain `f_locals` and collect values by name.
- Encode values and build `args` with `TraceWriter::arg`.
- Register the call via `TraceWriter::register_call(writer, fid, args)`.
- Fail fast by returning `Err(...)` if frame/locals access fails.
- `src/tracer.rs`
- Change the `Tracer` trait method signature: `fn on_py_start(...) ->
PyResult<()>`.
- Update docs for fail‑fast guidance.
- Update the callback wrapper `callback_py_start` to:
- Call `on_py_start` and match on the result.
- On `Err`, call `set_events(py, &tool, NO_EVENTS)`, log, and return the
error.
- `test/test_monitoring_events.py`
- Extend parser to collect `varnames` and `call_records`.
- Add the two tests specified above.
- `tests/test_fail_fast_on_py_start.py`
- Add the Python test that monkeypatches `_getframe` and asserts
fail‑fast behavior with monitoring disabled after the first error.
- `.gitignore`
- Add `.cargo/` to exclude Cargo cache/config directories from version
control.
## Edge Cases and Defensive Choices
- Missing locals for some parameter names are skipped. This is rare at
function start but should not crash the tracer.
- Deeply nested containers are recursively encoded. Extremely deep
structures may be expensive; this is acceptable for now.
- Dict encoding is general (applies to any Python `dict`), but kwargs
contexts will always produce string keys. Non‑string keys are encoded
normally.
- We intentionally do not modify module‑level activation flags during
fail‑fast; turning off events is sufficient to prevent further
callbacks, and explicit shutdown remains idempotent.
## Acceptance Criteria
- At least one `Call` event for the tested functions contains a
non‑empty `args` vector.
- Names and values for positional parameters match exactly, including
canonical `String` for Python `str`.
- `*args` and `**kwargs` are present and encoded according to the rules
above.
- When `_getframe` raises, the initial call propagates an exception and
subsequent calls do not re‑raise because monitoring was disabled.
- Tests described in this spec pass.
## Future Work
- Unify list/sequence language type naming across recorders (e.g.,
consistently "List").
- Consider introducing a dedicated mapping value kind for dictionaries
to avoid overloading `Sequence`.
- Consider stricter behavior for non‑string dict keys in non‑kwargs
contexts (fail vs. best effort).
0 commit comments