Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 290 additions & 0 deletions .archive/issues-2025-09-09.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# Archived Issues - 2025-09-09

## ISSUE-001
### Description
We need to record function arguments when calling a function

We have a function `encode_value` which is used to convert Python objects to value records. We need to use this function to encode the function arguments. To do that we should modify the `on_py_start` hook to load the current frame and to read the function arguments from it.

### Definition of Done
- Arguments for positional and pos-or-keyword parameters are recorded on function entry using the current frame's locals.
- Values are encoded via `encode_value` and attached to the `Call` event payload.
- A unit test asserts that multiple positional arguments (e.g. `a`, `b`) are present with correct encoded values.
- Varargs/kwargs and positional-only coverage are tracked in separate issues (see ISSUE-002, ISSUE-005).

### Status
Archived

Implemented for positional (and pos-or-keyword) arguments on function entry
using `sys._getframe(0)` and `co_varnames[:co_argcount]`, with counting fixed to
use `co_argcount` directly (includes positional-only; avoids double-counting).
Values are encoded via `encode_value` and attached to the `Call` event. Tests
validate correct presence and values. Varargs/kwargs remain covered by
ISSUE-002.




## ISSUE-002
### Description
Capture all Python argument kinds on function entry: positional-only,
pos-or-kw, keyword-only, plus varargs (`*args`) and kwargs (`**kwargs`). Extend
the current implementation that uses `co_argcount` and `co_varnames` to also
leverage `co_posonlyargcount` and `co_kwonlyargcount`, and detect varargs/kwargs
via code flags. Encode `*args` as a list value and `**kwargs` as a mapping value
to preserve structure.

### Definition of Done
- All argument kinds are captured on function entry: positional-only, pos-or-keyword, keyword-only, varargs (`*args`), and kwargs (`**kwargs`).
- `*args` is encoded as a list value; `**kwargs` is encoded as a mapping value.
- Positional-only and keyword-only parameters are included using `co_posonlyargcount` and `co_kwonlyargcount`.
- Comprehensive tests cover each argument kind and validate the encoded structure and values.

### Status
Archived

All argument kinds are captured on function entry, including kwargs with
structured encoding. Varargs are preserved as `Tuple` (per CPython), and
`**kwargs` are encoded as a `Sequence` of 2-element `Tuple`s `(key, value)`
with string keys, enabling lossless downstream analysis. The updated test
`test_all_argument_kinds_recorded_on_py_start` verifies the behavior.

Note: While the original Definition of Done referenced a mapping value kind,
the implementation follows the proposed approach in ISSUE-008 to represent
kwargs as a sequence of tuples using existing value kinds.



## ISSUE-005
### Description
Include positional-only parameters in argument capture. The current logic uses
only `co_argcount` for the positional slice, which excludes positional-only
arguments (PEP 570). As a result, names before the `/` in a signature like
`def f(p, /, q, *args, r, **kwargs)` are dropped.

### Definition of Done
- Positional-only parameters are included in the captured argument set.
- The selection of positional names accounts for `co_posonlyargcount` in addition to `co_argcount`.
- Tests add a function with positional-only parameters and assert their presence and correct encoding.

### Status
Archived

Implemented by selecting positional names from `co_varnames` using
`co_argcount` directly (which already includes positional-only per CPython 3.8+).
This prevents double-counting and keeps indexing stable. Tests in
`test_all_argument_kinds_recorded_on_py_start` assert presence of the
positional-only parameter `p` and pass.



## ISSUE-003
### Description
Avoid defensive fallback in argument capture. The current change swallows
failures to access the frame/locals and proceeds with empty `args`. Per
`rules/source-code.md` ("Avoid defensive programming"), we should fail fast when
encountering such edge cases.

### Definition of Done
- Silent fallbacks that return empty arguments on failure are removed.
- The recorder raises a clear, actionable error when it cannot access frame/locals.
- Tests verify the fail-fast path.

### Status
Archived

`RuntimeTracer::on_py_start` now returns `PyResult<()>` and raises a
`RuntimeError` when frame/locals access fails; `callback_py_start` propagates
the error to Python. A pytest (`tests/test_fail_fast_on_py_start.py`) asserts
the fail-fast behavior by monkeypatching `sys._getframe` to raise.



## ISSUE-004
### Description
Stabilize string value encoding for arguments and tighten tests. The new test
accepts either `String` or `Raw` kinds for the `'x'` argument, which can hide
regressions. We should standardize encoding of `str` as `String` (or document
when `Raw` is expected) and update tests to assert the exact kind.

### Definition of Done
- String values are consistently encoded as `String` (or the expected canonical kind), with any exceptions explicitly documented.
- Tests assert the exact kind for `str` arguments and fail if an unexpected kind (e.g., `Raw`) is produced.
- Documentation clarifies encoding rules for string-like types to avoid ambiguity in future changes.

### Status
Archived

Stricter tests now assert `str` values are encoded as `String` with the exact text payload, and runtime docs clarify canonical encoding. No runtime logic change was required since `encode_value` already produced `String` for Python `str`.



## ISSUE-006
### Description
Accidental check-in of Cargo cache/artifact files under `codetracer-python-recorder/.cargo/**` (e.g., `registry/CACHEDIR.TAG`, `.package-cache`). These are build/cache directories and should be excluded from version control.

### Definition of Done
- Add ignore rules to exclude Cargo cache directories (e.g., `.cargo/**`, `target/**`) from version control.
- Remove already-checked-in cache files from the repository.
- Verify the working tree is clean after a clean build; no cache artifacts appear as changes.

### Status
Archived



## ISSUE-007
### Description
Immediately stop tracing when any monitoring callback raises an error.

Current behavior: `RuntimeTracer::on_py_start` intentionally fails fast when it
cannot capture function arguments (e.g., when `sys._getframe` is unavailable or
patched to raise). The callback error is propagated to Python via
`callback_py_start` (it returns the `PyResult` from `on_py_start`). However, the
tracer remains installed and active after the error. As a result, any further
Python function start (even from exception-handling or printing the exception)
triggers `on_py_start` again, re-raising the same error and interfering with the
program’s own error handling.

This is observable in `codetracer-python-recorder/tests/test_fail_fast_on_py_start.py`:
the test simulates `_getframe` failure, which correctly raises in `on_py_start`,
but `print(e)` inside the test’s `except` block invokes codec machinery that
emits additional `PY_START` events. Those callbacks raise again, causing the test
to fail before reaching its assertions.

### Impact
- Breaks user code paths that attempt to catch and handle exceptions while the
tracer is active — routine operations like `print(e)` can cascade failures.
- Hard to debug because the original error is masked by subsequent callback
errors from unrelated modules (e.g., `codecs`).

### Proposed Solution
Fail fast and disable tracing at the first callback error.

Implementation sketch:
- In each callback wrapper (e.g., `callback_py_start`), if the underlying
tracer method returns `Err`, immediately disable further monitoring before
returning the error:
- Set events to `NO_EVENTS` (via `set_events`) to prevent any more callbacks.
- Unregister all previously registered callbacks for our tool id.
- Optionally call `finish()` on the tracer to flush/close writers.
- Option A (hard uninstall): call `uninstall_tracer(py)` to release tool id
and clear the registry. This fully tears down the tracer. Note that the
high-level `ACTIVE` flag in `lib.rs` is not updated by `uninstall_tracer`,
so either:
- expose an internal “deactivate_from_callback()” in `lib.rs` that clears
`ACTIVE`, or
- keep a soft-stop in `tracer.rs` by setting `NO_EVENTS` and unregistering
callbacks without touching `ACTIVE`, allowing `stop_tracing()` to be a
no-op later.
- Ensure reentrancy safety: perform the disable sequence only once (e.g., with
a guard flag) to avoid nested teardown during callback execution.

Behavioral details:
- The original callback error must still be propagated to Python so the user
sees the true failure cause, but subsequent code should not receive further
monitoring callbacks.
- If error occurs before activation gating triggers, the disable sequence should
still run to avoid repeated failures from unrelated modules importing.

### Definition of Done
- On any callback error (at minimum `on_py_start`, and future callbacks that may
return `PyResult`), all further monitoring callbacks from this tool are
disabled immediately within the same GIL context.
- The initial error is propagated unchanged to Python.
- The failing test `test_fail_fast_on_py_start.py` passes: after the first
failure, `print(e)` does not trigger additional tracer errors.
- Writers are flushed/closed or left in a consistent state (documented), and no
additional events are recorded after disablement.
- Unit/integration tests cover: error in `on_py_start`, repeated calls after
disablement are no-ops, and explicit `stop_tracing()` is safe after a
callback-induced shutdown.

### Status
Archived

Implemented soft-stop on first callback error in `callback_py_start`:
on error, the tracer finishes writers, unregisters callbacks for the
configured mask, sets events to `NO_EVENTS`, clears the registry, and
records `global.mask = NO_EVENTS`. The original error is propagated to
Python, and subsequent `PY_START` events are not delivered. This keeps the
module-level `ACTIVE` flag unchanged until `stop_tracing()`, making the
shutdown idempotent. The test `tests/test_fail_fast_on_py_start.py`
exercises the behavior by re-running the program after the initial failure.



## ISSUE-008
### Description
Provide structured encoding for kwargs (`**kwargs`) on function entry. The
current backend encodes kwargs as `Raw` text because the `runtime_tracing`
format lacks a mapping value. Introduce a mapping representation so kwargs can
be recorded losslessly with key/value structure and recursively encoded values.

### Definition of Done
- `runtime_tracing` supports a mapping value kind (e.g., `Map` with string keys).
- `RuntimeTracer::encode_value` encodes Python `dict` to the mapping kind with
recursively encoded values; key type restricted to `str` (non-`str` keys may
be stringified or rejected, behavior documented).
- `on_py_start` records `**kwargs` using the new mapping encoding.
- Tests verify kwargs structure and values; large and nested kwargs are covered.

### Proposed solution
- We can represent our `Map` as a sequenced of tuples. This way we can use the current value record types to encode dictionaries.
- In the Python recorder, downcast to `dict` and iterate items, encoding values
recursively; keep behavior minimal and fail fast on unexpected key types per
repo rules (no defensive fallbacks).

### Dependent issues
- Blocks completion of ISSUE-002

### Status
Archived

Implemented structured kwargs encoding in the Rust tracer by representing
Python `dict` as a `Sequence` of `(key, value)` `Tuple`s, with keys encoded as
`String` when possible. Tests in
`codetracer-python-recorder/test/test_monitoring_events.py` validate that
kwargs are recorded structurally. This fulfills the goal without introducing a
new mapping value kind, per the proposed solution.




## ISSUE-011
### Description
Create a concise set of small Python example scripts to exercise key code paths of the Rust‑backed recorder during development. Place all examples under `/examples` and make them easy to run with the module CLI.

### Definition of Done
- Create `/examples` at repo root.
- Add minimal, deterministic scripts covering common scenarios:
- `basic_args.py`: positional‑only, pos‑or‑kw, kw‑only, `*args`, `**kwargs`.
- `exceptions.py`: raise, catch, and `print(e)` inside `except`.
- `classes_methods.py`: instance, `@classmethod`, `@staticmethod`, property access.
- `recursion.py`: direct and mutual recursion.
- `generators_async.py`: generator, `async`/`await`, async generator.
- `context_and_closures.py`: `with` (context manager) and nested closures.
- `threading.py`: two threads invoking traced functions and joining.
- `imports_side_effects.py`: module‑level code vs `if __name__ == "__main__"`.
- `kwargs_nested.py`: nested kwargs structure to validate structured encoding.
- Each script:
- Has a brief module docstring stating its focus.
- Defines `main()` and uses the `__name__ == "__main__"` guard.
- Produces stable, minimal output without external dependencies.
- Add `/examples/README.md` listing scripts, purpose, and how to run via:
- `python -m codetracer_python_recorder --codetracer-format=json examples/<script>.py`

### Proposed solution
- Keep scripts focused and short to spotlight specific behaviors.
- Prefer deterministic flows; join threads and avoid non‑deterministic timing.
- Use the provided module CLI so recorder activation is consistent across runs.

### Status
Archived

Added the examples directory and scripts:
`basic_args.py`, `exceptions.py`, `classes_methods.py`, `recursion.py`,
`generators_async.py`, `context_and_closures.py`, `threading.py`,
`imports_side_effects.py`, `kwargs_nested.py`, plus `examples/README.md`
with usage instructions for running via
`python -m codetracer_python_recorder --codetracer-format=json examples/<script>.py`.
20 changes: 20 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Examples for exercising the Rust‑backed recorder during development.

Run any script via the module CLI so tracing is consistently enabled:

python -m codetracer_python_recorder --codetracer-format=json examples/<script>.py

Scripts

- basic_args.py: Demonstrates positional‑only, pos‑or‑kw, kw‑only, *args, **kwargs.
- exceptions.py: Raises, catches, and prints an exception in except.
- classes_methods.py: Instance, @classmethod, @staticmethod, and a property.
- recursion.py: Direct recursion (factorial) and mutual recursion.
- generators_async.py: A generator, async function, and async generator.
- context_and_closures.py: A context manager and a nested closure.
- threading.py: Two threads invoking traced functions and joining.
- imports_side_effects.py: Module‑level side effects vs main guard.
- kwargs_nested.py: Nested kwargs structure to validate structured encoding.

All scripts are deterministic and print minimal output.

19 changes: 19 additions & 0 deletions examples/basic_args.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Example: function with all Python argument kinds.

Covers positional-only, positional-or-keyword, keyword-only, *args, **kwargs.
"""

def f(p, /, q, *args, r, **kwargs): # noqa: D401 - simple demo
"""Return a tuple to keep behavior deterministic."""
return (p, q, args, r, kwargs)


def main() -> None:
res = f(1, 2, 3, 4, 5, r=6, a=7, b=8)
# Minimal stable output
print("ok", res[0], res[1], len(res[2]), res[3], sorted(res[4].items()))


if __name__ == "__main__":
main()

34 changes: 34 additions & 0 deletions examples/classes_methods.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""Example: classes, instance/class/static methods, and a property."""


class Counter:
def __init__(self, start: int = 0) -> None:
self._n = start

def inc(self, by: int = 1) -> int:
self._n += by
return self._n

@property
def double(self) -> int:
return self._n * 2

@classmethod
def start_at(cls, n: int) -> "Counter":
return cls(n)

@staticmethod
def add(a: int, b: int) -> int:
return a + b


def main() -> None:
c = Counter.start_at(2)
x = c.inc()
y = Counter.add(3, 4)
print("ok", x, y, c.double)


if __name__ == "__main__":
main()

27 changes: 27 additions & 0 deletions examples/context_and_closures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""Example: context manager and nested closures."""

from contextlib import contextmanager


@contextmanager
def tag(name: str):
# Minimal context; no side effects beyond yielding
yield f"<{name}>"


def make_multiplier(factor: int):
def mul(x: int) -> int:
return x * factor

return mul


def main() -> None:
with tag("x") as t:
mul3 = make_multiplier(3)
print("ok", t, mul3(5))


if __name__ == "__main__":
main()

Loading