`pawl/normalize/` — Canonical comparison surface

This module provides the principled surface for comparing sandbox policies across the compile-reverse-recompile pipeline. Structural differences between source SBPL and reversed SBPL are handled here, not papered over in the reverser or worked around in comparison logic.

For the conceptual architecture — what the compiler erases, what can be recovered, and how the confidence model works — see NORMALIZE.md.

Why normalization exists

The sandbox compiler transforms source SBPL in ways that cannot be perfectly inverted. These transformations fall into four categories: structural erasure (boolean canonicalization erases ordering, nesting, and algebraic equivalences), node sharing (the compiler merges decision-graph nodes across operations, losing predicate-operation ownership), baseline implicit promotions (the compiler silently allows or denies operations not mentioned in the source), and entitlement macro expansion (compile-time entitlement macros expand into concrete rules, losing the original macro intent). Rather than maintaining a growing list of "known differences to ignore," normalization commits to handling these transformations programmatically.

The goal: any profile that compiles and reverses should compare equal at the canonical level, without asterisks or caveats that leak into user-facing tools.

Pipeline (ordered)

compare_source_and_reversed() runs an ordered pass pipeline. Ordering is intentional because later passes depend on earlier canonicalization decisions.

Order	Pass	Flag	Module	Sidecar / Metadata
0	Entitlement augmentation	`augment_entitlements`	`passes/entitlement.py`	`entitlement_augmented`, `entitlement_keys`
1	Source normalization	always	`source.py`	—
2	Reversed normalization + reconstruction	always	`reversed.py`, `passes/reconstruction.py`	`has_reconstruction`
3	Wildcard collapse	`collapse_wildcards`	`passes/wildcard_collapse.py`	`wildcard_collapse_info`
4	Guided deny denormalization	`denormalize_require_not`	`passes/deny_denormalize.py`	`deny_denormalized`, `ops_denormalized`
5	Structural diff	always	`policy.py`	—
6	Import filtering	`filter_imports`	`baseline/imports.py`	`ImportExpansion`, `import_filtered_count`
7	Baseline filtering	`filter_baseline`	`baseline/predicates.py`	`baseline_filtered_count`
8	Predicate dropout filtering	`filter_predicate_dropout`	`passes/predicate_dropout.py`	`PredicateDropoutInfo`, `predicate_dropout_filtered`
9	Predicate merge filtering	`filter_predicate_merge`	`passes/predicate_filter.py`	`PredicateFilterInfo`, `predicate_merge_info`

When semantic mode is enabled, passes 5-9 are re-run on semantic diff output.

Sidecar pattern

Normalize passes that remove or transform differences return sidecar metadata alongside transformed output. Sidecars capture provenance for decisions so callers can distinguish:

differences that truly disappeared
differences that were filtered because a specific normalize pass applied

Examples:

passes/wildcard_collapse.py returns WildcardCollapseInfo
passes/predicate_dropout.py returns PredicateDropoutInfo
passes/predicate_filter.py returns PredicateFilterInfo
baseline/imports.py returns ImportExpansion

Sidecars are threaded into compare_source_and_reversed() result metadata when their pass is enabled.

What normalization handles

Transformation	Example	Module
Whitespace, ordering	`(subpath "/a") (subpath "/b")` vs `(subpath "/b") (subpath "/a")`	`source.py`
Predicate grouping	`(require-any (pred-a) (pred-b))` vs separate rules	`source.py`
Deny-as-negation	`(deny op X)` -> `(allow op (require-not X))`	`passes/deny_denormalize.py`
Wildcard family expansion/compression	`file-read-data` + `file-read-metadata` + `file-read-xattr` <-> `file-read*`	`passes/wildcard_collapse.py`, `operations.py`
Import flattening	Imported rules appear inline in reversed	`baseline/imports.py`
Baseline predicates	Compiler-added mach-lookup/file predicates	`baseline/predicates.py`
Entitlement blocks	`(let ((x (entitlement ...))) ...)` compiles to nothing	`passes/entitlement.py`
Param substitution	`(param "HOME")` -> literal path	`source.py`
Disconnected filters	Filters compiled but not connected to ops	`passes/reconstruction.py`
Regex equivalence	Different bytecode, same match behavior	`baseline/compiler_model.py`
Predicate dropout	Many bare allows causes predicates to be dropped	`passes/predicate_dropout.py`
Predicate simplification	`(path-regex #"^/path$")` -> `(literal "/path")`	`passes/predicate_dropout.py`
Predicate merge contamination	Node-sharing causes cross-operation predicate misattribution	`passes/predicate_filter.py` + `integration/ir/mappings/predicate_collapse.json`

Operation equivalence

Wildcards and their children are semantically equivalent in compiled blobs. The compiler consolidates child operations into wildcards when they share the same op-table entry.

Comparison supports both directions:

collapse_wildcards=True: collapse reversed child ops into wildcard form
wildcard_equivalence=True: tolerate reversed wildcard matching source children

See operations.py:WILDCARD_CHILDREN for wildcard family membership.

Core types

from pawl.normalize import (
    CanonicalPolicy,
    CanonicalRule,
    CanonicalPredicate,
    normalize_source,
    normalize_reversed,
)

Usage

from pawl.normalize.reversed import compare_source_and_reversed

result = compare_source_and_reversed(
    source_sbpl,
    reversed_sbpl,
    disconnected_filters=metadata.get("disconnected_filters"),
    param_bindings={"HOME": "/Users/alice"},
    filter_imports=True,
    collapse_wildcards=True,
    search_paths=[source_dir, system_profiles_dir],
)

if result["equivalent"]:
    print("Policies match at canonical level")
else:
    print("Diff:", result["diff"])

Comparison modes

Structural equivalence: exact canonical form match after normalization.
Semantic equivalence: ignores require-any/require-all grouping differences.
Relaxed equivalence (default): source rules must appear in reversed; reversed may include additional compiler-added rules.

Adding new normalizations

When a structural difference is discovered between source and reversed:

Characterize it: compiler behavior, reverser limitation, or genuine semantic difference.
Add it here if it is a known compiler transformation.
Add tests in integration/tests/pawl/normalize/ and contract coverage where needed.
Keep ownership boundaries clear: oracle facts in pawl/normalize/predicate_merge, comparison policy in pawl/normalize, render-time behavior in pawl/reverse.

Modules

Module	Purpose
`__init__.py`	Public normalize exports
`policy.py`	`CanonicalPolicy`, `CanonicalRule`, `CanonicalPredicate` and diff logic
`source.py`	`normalize_source()` and source canonicalization helpers
`reversed.py`	`normalize_reversed()` and `compare_source_and_reversed()`
`passes/reconstruction.py`	Decode disconnected filters into canonical predicates
`passes/wildcard_collapse.py`	Collapse wildcard-family child operations with sidecar provenance
`operations.py`	Operation normalization rules and wildcard family mapping
`passes/deny_denormalize.py`	Convert `require-not` structures back to explicit deny rules
`baseline/imports.py`	Import expansion simulation and imported-op filtering
`baseline/predicates.py`	Baseline predicate and op filtering
`passes/predicate_dropout.py`	Bare-allow predicate-dropout analysis and filtering
`passes/predicate_filter.py`	Mapping-backed predicate contamination filtering sidecar
`boolean_canonicalizer.py`	Shared S-expression simplification utilities
`baseline/compiler_model.py`	Compiler behavior model helpers
`passes/entitlement.py`	Entitlement let-block extraction/injection for comparison
`ir.py`	Canonical IR normalization surface
`NORMALIZE.md`	Conceptual architecture: compiler information loss, recovery model, ownership boundaries
`predicate_merge/`	Predicate merge oracle: admissibility facts, collapse rules, validation gates

Relation to other modules

pawl/reverse/: produces reversed SBPL. It does not own comparison normalization.
pawl/structure/: provides decoded IR and compile metadata used by normalization.
integration/ir/profile/five_point_harness.py: consumes compare_source_and_reversed() for roundtrip validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pawl/normalize/` — Canonical comparison surface

Why normalization exists

Pipeline (ordered)

Sidecar pattern

What normalization handles

Operation equivalence

Core types

Usage

Comparison modes

Adding new normalizations

Modules

Relation to other modules

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

pawl/normalize/ — Canonical comparison surface

Why normalization exists

Pipeline (ordered)

Sidecar pattern

What normalization handles

Operation equivalence

Core types

Usage

Comparison modes

Adding new normalizations

Modules

Relation to other modules

`pawl/normalize/` — Canonical comparison surface