Skip to content

Latest commit

 

History

History
167 lines (130 loc) · 8.68 KB

File metadata and controls

167 lines (130 loc) · 8.68 KB

pawl/normalize/ — Canonical comparison surface

This module provides the principled surface for comparing sandbox policies across the compile-reverse-recompile pipeline. Structural differences between source SBPL and reversed SBPL are handled here, not papered over in the reverser or worked around in comparison logic.

For the conceptual architecture — what the compiler erases, what can be recovered, and how the confidence model works — see NORMALIZE.md.

Why normalization exists

The sandbox compiler transforms source SBPL in ways that cannot be perfectly inverted. These transformations fall into four categories: structural erasure (boolean canonicalization erases ordering, nesting, and algebraic equivalences), node sharing (the compiler merges decision-graph nodes across operations, losing predicate-operation ownership), baseline implicit promotions (the compiler silently allows or denies operations not mentioned in the source), and entitlement macro expansion (compile-time entitlement macros expand into concrete rules, losing the original macro intent). Rather than maintaining a growing list of "known differences to ignore," normalization commits to handling these transformations programmatically.

The goal: any profile that compiles and reverses should compare equal at the canonical level, without asterisks or caveats that leak into user-facing tools.

Pipeline (ordered)

compare_source_and_reversed() runs an ordered pass pipeline. Ordering is intentional because later passes depend on earlier canonicalization decisions.

Order Pass Flag Module Sidecar / Metadata
0 Entitlement augmentation augment_entitlements passes/entitlement.py entitlement_augmented, entitlement_keys
1 Source normalization always source.py
2 Reversed normalization + reconstruction always reversed.py, passes/reconstruction.py has_reconstruction
3 Wildcard collapse collapse_wildcards passes/wildcard_collapse.py wildcard_collapse_info
4 Guided deny denormalization denormalize_require_not passes/deny_denormalize.py deny_denormalized, ops_denormalized
5 Structural diff always policy.py
6 Import filtering filter_imports baseline/imports.py ImportExpansion, import_filtered_count
7 Baseline filtering filter_baseline baseline/predicates.py baseline_filtered_count
8 Predicate dropout filtering filter_predicate_dropout passes/predicate_dropout.py PredicateDropoutInfo, predicate_dropout_filtered
9 Predicate merge filtering filter_predicate_merge passes/predicate_filter.py PredicateFilterInfo, predicate_merge_info

When semantic mode is enabled, passes 5-9 are re-run on semantic diff output.

Sidecar pattern

Normalize passes that remove or transform differences return sidecar metadata alongside transformed output. Sidecars capture provenance for decisions so callers can distinguish:

  • differences that truly disappeared
  • differences that were filtered because a specific normalize pass applied

Examples:

  • passes/wildcard_collapse.py returns WildcardCollapseInfo
  • passes/predicate_dropout.py returns PredicateDropoutInfo
  • passes/predicate_filter.py returns PredicateFilterInfo
  • baseline/imports.py returns ImportExpansion

Sidecars are threaded into compare_source_and_reversed() result metadata when their pass is enabled.

What normalization handles

Transformation Example Module
Whitespace, ordering (subpath "/a") (subpath "/b") vs (subpath "/b") (subpath "/a") source.py
Predicate grouping (require-any (pred-a) (pred-b)) vs separate rules source.py
Deny-as-negation (deny op X) -> (allow op (require-not X)) passes/deny_denormalize.py
Wildcard family expansion/compression file-read-data + file-read-metadata + file-read-xattr <-> file-read* passes/wildcard_collapse.py, operations.py
Import flattening Imported rules appear inline in reversed baseline/imports.py
Baseline predicates Compiler-added mach-lookup/file predicates baseline/predicates.py
Entitlement blocks (let ((x (entitlement ...))) ...) compiles to nothing passes/entitlement.py
Param substitution (param "HOME") -> literal path source.py
Disconnected filters Filters compiled but not connected to ops passes/reconstruction.py
Regex equivalence Different bytecode, same match behavior baseline/compiler_model.py
Predicate dropout Many bare allows causes predicates to be dropped passes/predicate_dropout.py
Predicate simplification (path-regex #"^/path$") -> (literal "/path") passes/predicate_dropout.py
Predicate merge contamination Node-sharing causes cross-operation predicate misattribution passes/predicate_filter.py + integration/ir/mappings/predicate_collapse.json

Operation equivalence

Wildcards and their children are semantically equivalent in compiled blobs. The compiler consolidates child operations into wildcards when they share the same op-table entry.

Comparison supports both directions:

  • collapse_wildcards=True: collapse reversed child ops into wildcard form
  • wildcard_equivalence=True: tolerate reversed wildcard matching source children

See operations.py:WILDCARD_CHILDREN for wildcard family membership.

Core types

from pawl.normalize import (
    CanonicalPolicy,
    CanonicalRule,
    CanonicalPredicate,
    normalize_source,
    normalize_reversed,
)

Usage

from pawl.normalize.reversed import compare_source_and_reversed

result = compare_source_and_reversed(
    source_sbpl,
    reversed_sbpl,
    disconnected_filters=metadata.get("disconnected_filters"),
    param_bindings={"HOME": "/Users/alice"},
    filter_imports=True,
    collapse_wildcards=True,
    search_paths=[source_dir, system_profiles_dir],
)

if result["equivalent"]:
    print("Policies match at canonical level")
else:
    print("Diff:", result["diff"])

Comparison modes

  • Structural equivalence: exact canonical form match after normalization.
  • Semantic equivalence: ignores require-any/require-all grouping differences.
  • Relaxed equivalence (default): source rules must appear in reversed; reversed may include additional compiler-added rules.

Adding new normalizations

When a structural difference is discovered between source and reversed:

  1. Characterize it: compiler behavior, reverser limitation, or genuine semantic difference.
  2. Add it here if it is a known compiler transformation.
  3. Add tests in integration/tests/pawl/normalize/ and contract coverage where needed.
  4. Keep ownership boundaries clear: oracle facts in pawl/normalize/predicate_merge, comparison policy in pawl/normalize, render-time behavior in pawl/reverse.

Modules

Module Purpose
__init__.py Public normalize exports
policy.py CanonicalPolicy, CanonicalRule, CanonicalPredicate and diff logic
source.py normalize_source() and source canonicalization helpers
reversed.py normalize_reversed() and compare_source_and_reversed()
passes/reconstruction.py Decode disconnected filters into canonical predicates
passes/wildcard_collapse.py Collapse wildcard-family child operations with sidecar provenance
operations.py Operation normalization rules and wildcard family mapping
passes/deny_denormalize.py Convert require-not structures back to explicit deny rules
baseline/imports.py Import expansion simulation and imported-op filtering
baseline/predicates.py Baseline predicate and op filtering
passes/predicate_dropout.py Bare-allow predicate-dropout analysis and filtering
passes/predicate_filter.py Mapping-backed predicate contamination filtering sidecar
boolean_canonicalizer.py Shared S-expression simplification utilities
baseline/compiler_model.py Compiler behavior model helpers
passes/entitlement.py Entitlement let-block extraction/injection for comparison
ir.py Canonical IR normalization surface
NORMALIZE.md Conceptual architecture: compiler information loss, recovery model, ownership boundaries
predicate_merge/ Predicate merge oracle: admissibility facts, collapse rules, validation gates

Relation to other modules

  • pawl/reverse/: produces reversed SBPL. It does not own comparison normalization.
  • pawl/structure/: provides decoded IR and compile metadata used by normalization.
  • integration/ir/profile/five_point_harness.py: consumes compare_source_and_reversed() for roundtrip validation.