Skip to content

Commit 2afa519

Browse files
committed
WS6-4
codetracer-python-recorder/README.md: codetracer-python-recorder/codetracer_python_recorder/cli.py: codetracer-python-recorder/resources/trace_filters/builtin_default.toml: codetracer-python-recorder/src/session/bootstrap.rs: codetracer-python-recorder/src/trace_filter/config.rs: codetracer-python-recorder/tests/python/test_cli_integration.py: design-docs/US0028 - Configurable Python trace filters.md: design-docs/adr/0009-configurable-trace-filters.md: design-docs/configurable-trace-filters-implementation-plan.md: design-docs/configurable-trace-filters-implementation-plan.status.md: design-docs/py-api-001.md: docs/onboarding/trace-filters.md: tf.toml: trace-out/trace_metadata.json: trace-out/trace_paths.json: Signed-off-by: Tzanko Matev <[email protected]>
1 parent 8ae93e8 commit 2afa519

File tree

12 files changed

+200
-19
lines changed

12 files changed

+200
-19
lines changed

codetracer-python-recorder/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ or activated virtual environments behave identically to `python script.py`.
5555
- Filter files are TOML with `[meta]`, `[scope]`, and `[[scope.rules]]` tables. Rules evaluate in declaration order and can tweak both execution (`exec`) and value decisions (`value_default`).
5656
- Supported selector domains: `pkg`, `file`, `obj` for scopes; `local`, `global`, `arg`, `ret`, `attr` for value policies. Match types default to `glob` and also accept `regex` or `literal` (e.g. `local:regex:^(metric|masked)_\w+$`).
5757
- Default discovery: `.codetracer/trace-filter.toml` next to the traced script. Chain additional files via CLI (`--trace-filter path_a --trace-filter path_b`), environment variable (`CODETRACER_TRACE_FILTER=path_a::path_b`), or Python helpers (`trace(..., trace_filter=[path_a, path_b])`). Later entries override earlier ones when selectors overlap.
58+
- A built-in `builtin-default` filter is always prepended. It skips CPython standard-library frames (e.g. `asyncio`, `threading`, `importlib`) while re-enabling third-party packages under `site-packages` (except helpers such as `_virtualenv.py`), and redacts common secrets (`password`, `token`, API keys, etc.) across locals/globals/args/returns/attributes. Project filters can loosen or tighten these defaults as required.
5859
- Runtime metadata captures the active chain under `trace_metadata.json -> trace_filter`, including per-kind redaction counters. See `docs/onboarding/trace-filters.md` for the full DSL reference and examples.
5960

6061
Example snippet:

codetracer-python-recorder/codetracer_python_recorder/cli.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import argparse
55
import json
6+
import os
67
import runpy
78
import sys
89
from dataclasses import dataclass
@@ -11,6 +12,7 @@
1112
from typing import Iterable, Sequence
1213

1314
from . import flush, start, stop
15+
from .auto_start import ENV_TRACE_FILTER
1416
from .formats import DEFAULT_FORMAT, SUPPORTED_FORMATS, normalize_format
1517

1618

@@ -252,6 +254,9 @@ def main(argv: Iterable[str] | None = None) -> int:
252254
script_path = config.script
253255
script_args = config.script_args
254256
filter_specs = list(config.trace_filter)
257+
env_filter = os.getenv(ENV_TRACE_FILTER)
258+
if env_filter:
259+
filter_specs.insert(0, env_filter)
255260
policy_overrides = config.policy_overrides if config.policy_overrides else None
256261

257262
old_argv = sys.argv
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
[meta]
2+
name = "builtin-default"
3+
version = 1
4+
description = "Skip CPython stdlib internals and redact common sensitive identifiers."
5+
labels = ["builtin", "default"]
6+
7+
[scope]
8+
default_exec = "trace"
9+
default_value_action = "allow"
10+
11+
[[scope.rules]]
12+
selector = 'file:regex:.*[\\/](lib|Lib)[\\/]python\d+\.\d+[\/].*'
13+
exec = "skip"
14+
reason = "Skip Python standard library files"
15+
16+
[[scope.rules]]
17+
selector = 'file:regex:.*[\\/](lib|Lib)[\\/]python\d+\.\d+[\/]site-packages[\/].*'
18+
exec = "trace"
19+
reason = "Allow third-party packages under site-packages"
20+
21+
[[scope.rules]]
22+
selector = 'file:regex:.*[\\/]site-packages[\\/]_virtualenv\.py$'
23+
exec = "skip"
24+
reason = "Skip virtualenv bootstrap helper"
25+
26+
[[scope.rules]]
27+
selector = 'pkg:regex:^(asyncio|selectors|concurrent|importlib|threading|multiprocessing)(\.|$)'
28+
exec = "skip"
29+
reason = "Skip noisy stdlib async/concurrency internals"
30+
31+
[[scope.rules]]
32+
selector = 'pkg:literal:builtins'
33+
exec = "skip"
34+
reason = "Skip builtins module instrumentation"
35+
36+
[[scope.rules]]
37+
selector = 'pkg:glob:*'
38+
value_default = "allow"
39+
40+
[[scope.rules.value_patterns]]
41+
selector = 'local:regex:(?i).*(pass(word)?|passwd|pwd|secret|token|session|cookie|auth|credential|creds|bearer|ssn|credit|card|iban|cvv|cvc|pan|api[_-]?key|private[_-]?key|secret[_-]?key|ssh[_-]?key|jwt|refresh[_-]?token|access[_-]?token).*'
42+
action = "deny"
43+
reason = "Redact sensitive locals"
44+
45+
[[scope.rules.value_patterns]]
46+
selector = 'global:regex:(?i).*(pass(word)?|passwd|pwd|secret|token|session|cookie|auth|credential|creds|bearer|ssn|credit|card|iban|cvv|cvc|pan|api[_-]?key|private[_-]?key|secret[_-]?key|ssh[_-]?key|jwt|refresh[_-]?token|access[_-]?token).*'
47+
action = "deny"
48+
reason = "Redact sensitive globals"
49+
50+
[[scope.rules.value_patterns]]
51+
selector = 'arg:regex:(?i).*(pass(word)?|passwd|pwd|secret|token|session|cookie|auth|credential|creds|bearer|ssn|credit|card|iban|cvv|cvc|pan|api[_-]?key|private[_-]?key|secret[_-]?key|ssh[_-]?key|jwt|refresh[_-]?token|access[_-]?token).*'
52+
action = "deny"
53+
reason = "Redact sensitive arguments"
54+
55+
[[scope.rules.value_patterns]]
56+
selector = 'ret:regex:(?i).*(pass(word)?|passwd|pwd|secret|token|session|cookie|auth|credential|creds|bearer|ssn|credit|card|iban|cvv|cvc|pan|api[_-]?key|private[_-]?key|secret[_-]?key|ssh[_-]?key|jwt|refresh[_-]?token|access[_-]?token).*'
57+
action = "deny"
58+
reason = "Redact sensitive return values"
59+
60+
[[scope.rules.value_patterns]]
61+
selector = 'attr:regex:(?i).*(pass(word)?|passwd|pwd|secret|token|session|cookie|auth|credential|creds|bearer|ssn|credit|card|iban|cvv|cvc|pan|api[_-]?key|private[_-]?key|secret[_-]?key|ssh[_-]?key|jwt|refresh[_-]?token|access[_-]?token).*'
62+
action = "deny"
63+
reason = "Redact sensitive attributes"

codetracer-python-recorder/src/session/bootstrap.rs

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ pub struct TraceSessionBootstrap {
3333

3434
const TRACE_FILTER_DIR: &str = ".codetracer";
3535
const TRACE_FILTER_FILE: &str = "trace-filter.toml";
36+
const BUILTIN_FILTER_LABEL: &str = "builtin-default";
37+
const BUILTIN_TRACE_FILTER: &str =
38+
include_str!("../../resources/trace_filters/builtin_default.toml");
3639

3740
impl fmt::Debug for TraceSessionBootstrap {
3841
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
@@ -173,11 +176,10 @@ fn load_trace_filter(
173176
chain.extend(paths.iter().cloned());
174177
}
175178

176-
if chain.is_empty() {
177-
return Ok(None);
178-
}
179-
180-
let config = TraceFilterConfig::from_paths(&chain)?;
179+
let config = TraceFilterConfig::from_inline_and_paths(
180+
&[(BUILTIN_FILTER_LABEL, BUILTIN_TRACE_FILTER)],
181+
&chain,
182+
)?;
181183
Ok(Some(Arc::new(TraceFilterEngine::new(config))))
182184
}
183185

@@ -234,6 +236,7 @@ mod tests {
234236
use super::*;
235237
use pyo3::types::PyList;
236238
use recorder_errors::ErrorCode;
239+
use std::path::PathBuf;
237240
use tempfile::tempdir;
238241

239242
#[test]
@@ -347,7 +350,7 @@ mod tests {
347350
}
348351

349352
#[test]
350-
fn prepare_bootstrap_ignores_missing_trace_filter() {
353+
fn prepare_bootstrap_applies_builtin_trace_filter() {
351354
Python::with_gil(|py| {
352355
let tmp = tempdir().expect("tempdir");
353356
let trace_dir = tmp.path().join("out");
@@ -365,7 +368,13 @@ mod tests {
365368
.expect("restore argv");
366369

367370
let bootstrap = result.expect("bootstrap");
368-
assert!(bootstrap.trace_filter().is_none());
371+
let engine = bootstrap.trace_filter().expect("builtin filter");
372+
let summary = engine.summary();
373+
assert_eq!(summary.entries.len(), 1);
374+
assert_eq!(
375+
summary.entries[0].path,
376+
PathBuf::from("<inline:builtin-default>")
377+
);
369378
});
370379
}
371380

@@ -416,8 +425,12 @@ mod tests {
416425
let bootstrap = result.expect("bootstrap");
417426
let engine = bootstrap.trace_filter().expect("filter engine");
418427
let summary = engine.summary();
419-
assert_eq!(summary.entries.len(), 1);
420-
assert_eq!(summary.entries[0].path, filter_path);
428+
assert_eq!(summary.entries.len(), 2);
429+
assert_eq!(
430+
summary.entries[0].path,
431+
PathBuf::from("<inline:builtin-default>")
432+
);
433+
assert_eq!(summary.entries[1].path, filter_path);
421434
});
422435
}
423436

@@ -494,9 +507,13 @@ mod tests {
494507
let bootstrap = result.expect("bootstrap");
495508
let engine = bootstrap.trace_filter().expect("filter engine");
496509
let summary = engine.summary();
497-
assert_eq!(summary.entries.len(), 2);
498-
assert_eq!(summary.entries[0].path, default_filter_path);
499-
assert_eq!(summary.entries[1].path, override_filter_path);
510+
assert_eq!(summary.entries.len(), 3);
511+
assert_eq!(
512+
summary.entries[0].path,
513+
PathBuf::from("<inline:builtin-default>")
514+
);
515+
assert_eq!(summary.entries[1].path, default_filter_path);
516+
assert_eq!(summary.entries[2].path, override_filter_path);
500517
});
501518
}
502519
}

codetracer-python-recorder/src/trace_filter/config.rs

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -149,14 +149,27 @@ pub struct TraceFilterConfig {
149149
impl TraceFilterConfig {
150150
/// Load and compose filters from the provided paths.
151151
pub fn from_paths(paths: &[PathBuf]) -> RecorderResult<Self> {
152-
if paths.is_empty() {
152+
Self::from_inline_and_paths(&[], paths)
153+
}
154+
155+
/// Load and compose filters from inline TOML sources combined with paths.
156+
///
157+
/// Inline entries are ingested first in the order provided, followed by files.
158+
pub fn from_inline_and_paths(
159+
inline: &[(&str, &str)],
160+
paths: &[PathBuf],
161+
) -> RecorderResult<Self> {
162+
if inline.is_empty() && paths.is_empty() {
153163
return Err(usage!(
154164
ErrorCode::InvalidPolicyValue,
155-
"no trace filter paths supplied"
165+
"no trace filter sources supplied"
156166
));
157167
}
158168

159169
let mut aggregator = ConfigAggregator::default();
170+
for (label, contents) in inline {
171+
aggregator.ingest_inline(label, contents)?;
172+
}
160173
for path in paths {
161174
aggregator.ingest_file(path)?;
162175
}
@@ -225,8 +238,17 @@ impl ConfigAggregator {
225238
)
226239
})?;
227240

228-
let checksum = calculate_sha256(&contents);
229-
let raw: RawFilterFile = toml::from_str(&contents).map_err(|err| {
241+
self.ingest_source(path, &contents)
242+
}
243+
244+
fn ingest_inline(&mut self, label: &str, contents: &str) -> RecorderResult<()> {
245+
let pseudo_path = PathBuf::from(format!("<inline:{label}>"));
246+
self.ingest_source(&pseudo_path, contents)
247+
}
248+
249+
fn ingest_source(&mut self, path: &Path, contents: &str) -> RecorderResult<()> {
250+
let checksum = calculate_sha256(contents);
251+
let raw: RawFilterFile = toml::from_str(contents).map_err(|err| {
230252
usage!(
231253
ErrorCode::InvalidPolicyValue,
232254
"failed to parse trace filter '{}': {}",
@@ -771,6 +793,7 @@ const VALUE_SELECTOR_KINDS: [SelectorKind; 5] = [
771793
mod tests {
772794
use super::*;
773795
use std::io::Write;
796+
use std::path::PathBuf;
774797
use tempfile::tempdir;
775798

776799
#[test]
@@ -870,6 +893,30 @@ mod tests {
870893
Ok(())
871894
}
872895

896+
#[test]
897+
fn from_inline_and_paths_parses_inline_only() -> RecorderResult<()> {
898+
let inline_filter = r#"
899+
[meta]
900+
name = "inline"
901+
version = 1
902+
903+
[scope]
904+
default_exec = "trace"
905+
default_value_action = "allow"
906+
"#;
907+
908+
let config = TraceFilterConfig::from_inline_and_paths(&[("inline", inline_filter)], &[])?;
909+
910+
assert_eq!(config.default_exec(), ExecDirective::Trace);
911+
assert_eq!(config.default_value_action(), ValueAction::Allow);
912+
assert_eq!(config.rules().len(), 0);
913+
let summary = config.summary();
914+
assert_eq!(summary.entries.len(), 1);
915+
assert_eq!(summary.entries[0].name, "inline");
916+
assert_eq!(summary.entries[0].path, PathBuf::from("<inline:inline>"));
917+
Ok(())
918+
}
919+
873920
#[test]
874921
fn rejects_unknown_keys() {
875922
let temp = tempdir().expect("temp dir");

codetracer-python-recorder/tests/python/test_cli_integration.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,48 @@ def test_cli_honours_trace_filter_chain(tmp_path: Path) -> None:
137137
filters = trace_filter.get("filters", [])
138138
paths = [entry.get("path") for entry in filters if isinstance(entry, dict)]
139139
assert paths == [
140+
"<inline:builtin-default>",
140141
str(default_filter.resolve()),
141142
str(override_filter.resolve()),
142143
]
144+
145+
146+
def test_cli_honours_env_trace_filter(tmp_path: Path) -> None:
147+
script = tmp_path / "program.py"
148+
_write_script(script, "print('env filter test')\n")
149+
150+
filter_path = tmp_path / "env-filter.toml"
151+
filter_path.write_text(
152+
"""
153+
[meta]
154+
name = "env-filter"
155+
version = 1
156+
157+
[scope]
158+
default_exec = "trace"
159+
default_value_action = "allow"
160+
161+
[[scope.rules]]
162+
selector = "pkg:program"
163+
exec = "skip"
164+
value_default = "allow"
165+
""",
166+
encoding="utf-8",
167+
)
168+
169+
trace_dir = tmp_path / "trace"
170+
env = _prepare_env()
171+
env["CODETRACER_TRACE_FILTER"] = str(filter_path)
172+
173+
result = _run_cli(["--trace-dir", str(trace_dir), str(script)], cwd=tmp_path, env=env)
174+
assert result.returncode == 0
175+
176+
metadata_file = trace_dir / "trace_metadata.json"
177+
payload = json.loads(metadata_file.read_text(encoding="utf-8"))
178+
trace_filter = payload.get("trace_filter", {})
179+
filters = trace_filter.get("filters", [])
180+
paths = [entry.get("path") for entry in filters if isinstance(entry, dict)]
181+
assert paths == [
182+
"<inline:builtin-default>",
183+
str(filter_path.resolve()),
184+
]

design-docs/US0028 - Configurable Python trace filters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ As a **Python team lead**, I want **a powerful configuration language to filter
3636
- [ ] Scenario: Default filter protects secrets
3737
- Given no filter file is provided
3838
- When the recorder starts
39-
- Then a built-in best-effort secret redaction policy is applied and the user is notified how to supply a project-specific filter
39+
- Then a built-in best-effort secret redaction policy is applied, standard-library/asyncio frames are skipped, and the user is notified how to supply a project-specific filter
4040
- [ ] Scenario: Validate configuration errors
4141
- Given I supply an invalid rule (e.g., circular include)
4242
- When I launch the recorder

design-docs/adr/0009-configurable-trace-filters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ The solution has to load human-authored TOML, enforce schema validation, and add
2929
- Resolve `inherit` defaults while chaining multiple files (split on `::`). Later files append to the ordered rule list; `value_patterns` are likewise appended.
3030
2. **Expose filter loading at session bootstrap.**
3131
- Extend `TraceSessionBootstrap` to locate the default project filter (`<cwd>/.codetracer/trace-filter.toml` up the directory tree) and accept optional override specs from CLI, Python API, or env (`CODETRACER_TRACE_FILTER`).
32+
- Prepend a bundled `builtin-default` filter that redacts common secrets and skips CPython standard-library/asyncio frames before applying project/user filters.
3233
- Parse each provided file once per `start_tracing` call. Propagate `RecorderError` on IO or schema failures with context about the offending selector.
3334
3. **Wire the engine into `RuntimeTracer`.**
3435
- Store `Arc<TraceFilterEngine>` plus a per-code cache of `ResolvedScope` decisions (`HashMap<usize, ScopeResolution>`). Each resolution records:

design-docs/configurable-trace-filters-implementation-plan.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ Related ADR: 0009 – Configurable Trace Filters for codetracer-python-recorder
8888
- Update `#[pyfunction] start_tracing` signature with `#[pyo3(signature = (path, format, activation_path=None, trace_filter=None))]`.
8989
- Parse `trace_filter` (string/path) into `FilterSpec`, split on `::`, resolve to absolute paths, and feed into loader. Map errors via `RecorderError`.
9090
- Extend `TraceSessionBootstrap` (or adjacent helper) to find the default `<project>/.codetracer/trace-filter.toml` by walking up from the script path when no explicit spec is provided.
91+
- Prepend a built-in default filter (shipped with the crate) that redacts common secrets and skips standard-library/asyncio frames before applying project/user filters.
9192
- Modify `session.start` and `.trace` to accept `trace_filter` keyword; wrap `pathlib.Path` inputs.
9293
- CLI:
9394
- Add `--trace-filter path` (repeatable). When multiple provided, respect CLI order; combine with default using `::`.

design-docs/configurable-trace-filters-implementation-plan.status.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
- `codetracer-python-recorder/src/lib.rs`
2424
- `codetracer-python-recorder/benches/trace_filter.rs` *(WS6 microbench harness)*
2525
- `Justfile` *(WS6 bench automation)*
26+
- `codetracer-python-recorder/resources/trace_filters/builtin_default.toml` *(WS6 builtin defaults)*
2627
- Future stages: `codetracer-python-recorder/src/runtime/mod.rs`, Python surface files under `codetracer_python_recorder/`
2728

2829
## Stage Progress
@@ -31,7 +32,7 @@
3132
-**WS3 – Runtime Engine & Caching:** Implemented `trace_filter::engine` with `TraceFilterEngine::resolve` caching `ScopeResolution` entries per code id (DashMap), deriving module/object/file metadata, and compiling value policies with ordered pattern evaluation. Added `ValueKind` to align future runtime integration and unit tests proving caching, rule precedence (object > package/file), and relative path normalisation—all exercised via `just cargo-test`.
3233
-**WS4 – RuntimeTracer Integration:** `RuntimeTracer` now accepts an optional `Arc<TraceFilterEngine>`, caches `ScopeResolution` results per code id, and records `filter_scope_skip` when scopes are denied. Value capture helpers honour `ValuePolicy` with a reusable `<redacted>` sentinel, emit per-kind telemetry, and we persist the active filter summary plus skip/redaction counts into `trace_metadata.json`. Bootstrapping now discovers `.codetracer/trace-filter.toml`, instantiates `TraceFilterEngine`, and passes the shared `Arc` into `RuntimeTracer::new`; new `session::bootstrap` tests cover both presence/absence of the default filter and `just cargo-test` (nextest `--no-default-features`) confirms the flow end-to-end.
3334
-**WS5 – Python Surface, CLI, Metadata:** Session helpers normalise chained specs, auto-start honours `CODETRACER_TRACE_FILTER`, PyO3 merges explicit/default chains, CLI exposes `--trace-filter`, unit coverage exercises env auto-start filter chaining, and docs/CLI help now describe filter precedence and env wiring.
34-
-**WS6 – Hardening, Benchmarks & Documentation:** Completed selector error logging hardening, delivered Rust + Python benchmarking harnesses with `just bench` automation, refreshed the Nix dev shell (gnuplot) to keep Criterion plots available, and closed documentation gaps (README, onboarding guide). Follow-on benchmarking integration tasks are tracked under ADR 0010.
35+
-**WS6 – Hardening, Benchmarks & Documentation:** Completed selector error logging hardening, introduced a built-in default filter that redacts sensitive identifiers and skips stdlib/asyncio frames, delivered Rust + Python benchmarking harnesses with `just bench` automation, refreshed the Nix dev shell (gnuplot) to keep Criterion plots available, and closed documentation gaps (README, onboarding guide). Follow-on benchmarking integration tasks are tracked under ADR 0010.
3536

3637
## WS5 Progress Checklist
3738
1. ✅ Introduced Python-side helpers that normalise `trace_filter` inputs (strings, Paths, iterables) into absolute path chains, updated session API/context manager, and threaded env-driven auto-start.

0 commit comments

Comments
 (0)