Intermediate Representation

This document describes PAWL's single-profile intermediate representation. The IR is two-layer in purpose but is carried through a top-level envelope: the evidence layer preserves byte-faithful structure, the policy layer interprets that structure as a typed decision DAG, and the envelope binds both to coordinates and hashes. PAWL treats durability as a validation property, not just a schema property: the IR is useful because it keeps unknowns explicit and because it survives roundtrip checking in the five-point harness. Bundle-level IR is a neighboring surface and is out of scope here.

The IR Stack

PAWL has four distinct policy-facing shapes for single-profile work. They sit at different levels of interpretation and serve different consumers:

SbplIR from integration/ir/decode_evidence.py: evidence receipt from the decode-evidence lane, built directly from a compiled blob and its decoded records before higher-level classification exists
typed Policy DAG from pawl.contract.policy_dag: the semantic center, a typed decision graph shared across lanes
pawl.profile.ir.v1 from integration/ir/profile/ir_builder.py: comparison-oriented classification view for profile roundtrip work
IREnvelope from pawl/reverse/api.py plus pawl.contract.*: contract packaging of a typed policy plus coordinates and hashes

This stack exists because the information available at each stage is different. SbplIR records what byte-level decoding can see before the mapping artifacts and reverse engine exist. The typed Policy DAG is what compile-decode or reverse can produce once header hints or full graph recovery are available. pawl.profile.ir.v1 is the view that makes roundtrip comparison tractable by knowing about slot classes, node graph structure, and op-table surfaces. And IREnvelope is what the contract layer packages for durable downstream consumption.

The mapping artifacts that connect SbplIR to higher layers are built by build_op_slots and build_filter_arg_specs, which consume only SbplIR.evidence. Those artifacts must exist before load_reverse_context can produce pawl.profile.ir.v1.

Evidence Layer

The evidence layer records what a compiled profile looks like before we decide what it means. Its job is structural fidelity: section framing, op-table targets, node records, literal-oriented records, and any opaque corridors that must be preserved even when they are not yet fully interpreted.

For modern profiles, the primary shape is a header plus a profiles/op-table region, any pre-node tables, a node stream, and a literal-oriented region. That is the main case this document describes. But PAWL's evidence layer is format-variant-aware rather than hardcoded to one permanently fixed layout: modern profiles, message-filter profiles, and legacy shapes do not all expose the same clean framing. Section slicing is therefore treated as heuristic but bounded, and the evidence layer records framing witnesses instead of pretending the layout is solved once and for all.

This is also why the literal-oriented region should not be described as "just a pool of NUL-terminated UTF-8 strings." In the owner contract it is represented as deterministic records with offsets and classes, while printable string extraction remains an auxiliary heuristic rather than the evidence format itself. Likewise, unknown or opaque regions are preserved rather than guessed away. Two artifacts can still be compared structurally even when some regions remain unknown, because uncertainty is represented explicitly instead of being collapsed into omission.

Policy Layer

The policy layer is the typed decision graph built on top of evidence. It is not a generic "semantic graph" with informal node labels. A Policy maps operation names to OperationDAG values, and each OperationDAG carries an entry node plus a node map for that operation's reachable decision structure.

The node vocabulary is the contract vocabulary:

NodeTest
NodeAllow
NodeDeny
NodeJumpOp
NodeUnknown
NodeUnknownInlinePolicyRef

EntryPoint is not a node kind in this model; operation entry is carried by OperationDAG.entry. The explicit Unknown forms matter because they let PAWL keep partial understanding inside the IR rather than forcing premature reclassification or dropping unresolved structure on the floor.

Predicate tests are carried by FilterAtom(name, args, provenance), and the arguments are typed values rather than loose payloads. The value vocabulary is:

ValueString
ValueRegex
ValueBool
ValueNumber
ValueSymbol
ValueParamRef

That type system is part of what makes the policy layer renderable back to SBPL in a disciplined way. A regex literal, a string literal, a boolean, and a parameter reference are not interchangeable payloads; they are different values with different renderings and different comparison behavior.

Provenance is attached directly to policy objects. Nodes carry NodeProvenance, including origin_offsets, origin_spans, and optional origin metadata, and FilterAtom can also carry provenance. This is how the policy layer remains operationally tied to the underlying bytes and source spans: a semantic claim is not just "what we think the policy means," but "what we think it means, and where that claim came from."

Envelope, Coordinates, and Hash Boundaries

At the contract level, PAWL packages single-profile IR in an IREnvelope carrying world_id, policy_ir, evidence_ir, coordinates, and hashes.

The coordinate bridge matters because provenance is not always one-to-one. CoordinateIndex maps policy nodes back to blob locations, and a NodeCoordinate can carry multiple node indices and byte offsets because the compiler can share nodes across operations. Missing blob coordinates are also explicit: instead of inventing offsets, the contract uses absence_reason.

Hash boundaries are represented at three levels:

profile_hash for the full compiled blob
section_hashes for op_table, node_stream, and literal_pool
subgraph_hashes for per-operation reachable decision subgraphs

These hash modes are not all available under the same conditions. In a blob-backed path, PAWL can populate all three levels. In a source-evaluate path, there is no compiled blob, so profile_hash and section_hashes are null while policy-derived subgraph_hashes can still exist. In that same source-only mode, coordinates can still be materialized with absence_reason="source_evaluate_no_blob" so that downstream consumers see an honest absence rather than a fake location.

The evidence hash boundary is intentionally narrow. Structural hashes are meant to move when structure moves, not when convenience annotations, previews, or other non-structural views improve. This is what lets PAWL distinguish structure drift from interpretation drift.

`SbplIR` Schema

SbplIR is the evidence receipt emitted by integration/ir/decode_evidence.py:build_ir_from_decode(). It is a JSON-like object with four top-level fields:

format: fixed string "sbpl-ir@1"
evidence: required structural evidence
policy_receipt: required today, but only as an evidence-derived policy receipt (not the typed semantic policy object)
annotations: required today, excluded from the evidence hash boundary

`evidence.header`

This records the parsed header plus slicing and framing witnesses:

format_variant: one of the ingestion variants such as "modern-struct-16", "modern-heuristic", or "legacy-decision-tree"
type_flags: raw profile type flags when available
node_count
op_table_width
vars_count
states_count
regex_count
entitlements_count
num_profiles
instructions_count
node_stride_bytes
nodes_start
nodes_end
literal_start
literal_end
nodes_to_literal_delta
op_table_mode: currently "raw-index" or "tagged-byte-offset" when the decoder provides that sanity classification

preamble_words_full is not part of evidence.header; it lives under annotations.

`evidence.tables`

This is populated for modern-struct-16 profiles and is otherwise null.

When present, it records the pre-node tables as explicit arrays rather than burying them in padding:

profiles: list of profile records, each with name_offset_u16, reserved_u16, and op_table
entitlements_u16
regex_u16
vars_u16
states_u16
alignment_padding_len

For non-bundle blobs, profiles still appears as a one-element list.

`evidence.op_table`

This is ordered by slot index. Each entry preserves both the raw u16 and the decoded target view:

slot
raw_u16
tag_bits
target:
- node_index
- byte_offset
- absolute_offset

`evidence.nodes`

Nodes stay in blob order. Each node record is a byte-faithful evidence record with two structured helper subobjects:

index
tag
kind
record_size
u16
raw_hex
layout:
- edge_indices
- payload_indices
- u16_role
- layout_provenance
annotations:
- filter_vocab_ref
- filter_kind_ref
- filter_arg_raw
- filter_out_of_vocab
- literal_refs
- literal_refs_provenance
- literal_ref_matches

The nested annotations subobject is intentionally outside the evidence hash boundary.

`evidence.literal_pool`

The literal region is represented as deterministic records, not as a bag of printable strings. The top-level fields are:

raw_sha256
content_digest
layout_digest
records

Each record has:

record_index
literal_id
rel_offset
len_u16
payload_sha256
optional payload_preview_ascii
class

The currently emitted class values are:

len_prefixed
trailer_bytes
align_pad
remainder

payload_preview_ascii is a convenience field and is excluded from the evidence hash boundary.

`evidence.opaque_regions`

This carries message-filter and opaque-corridor information when present:

message_filter
masked_blob_sha256
opaque_region

For message-filter profiles this lets comparison distinguish whole-blob drift from drift outside the opaque corridor.

Evidence Hash Boundary

Evidence hashing is gated solely by sanitize_evidence_for_hash() in integration/ir/decode_evidence.py. Treat edits to that filter as breaking changes for evidence-level comparisons.

The current sanitizer strips:

payload_preview_ascii
notes
preamble_words_full
nested annotations

The contract is:

structural evidence lives under evidence
convenience views and heuristics live under annotations or nested annotations subkeys
improving annotations must not silently move the structural hash boundary

Policy Receipt

The policy_receipt field on SbplIR is not the project-level typed Policy DAG. It is the evidence-derived receipt emitted by _build_policy_receipt_from_evidence() in integration/ir/decode_evidence.py.

That receipt is not evidence-only in the strict sense: it also loads the op-slot and filter-binding mapping artifacts that let the decode-evidence lane name operations and attach literal-bound rule payloads. But it exists to record what the decode-evidence lane could name and bind without pretending to be the semantic policy center. It is useful for:

mapping progress and partial-recovery status under annotations.policy_status
localization and debugging through evidence_refs, node_span, derivation tags, witness ids, and confidence labels
a minimal receipt-lane fallback render when no typed Policy DAG has been built alongside the receipt

Its shape is:

default_decision: currently "deny"
operations: ordered list of reconstructed operations

Each operation record has:

slot
name
confidence
derivation
rules

Each derivation object currently carries:

tag
mapping_artifact_sha256
witness_case_ids

Each rule currently carries:

decision
predicate
confidence
derivation
evidence_refs
node_span

Current emitted predicate objects are atom-shaped:

op: filter or synthetic predicate name
args: list of evidence-oriented argument payloads

This is narrower than the typed predicate/value model in pawl.contract.policy_dag and pawl.contract.values.

Predicate argument payloads

SbplIR.policy_receipt does not currently emit the typed ValueString / ValueRegex / ValueParamRef union used by the project-level Policy DAG. Instead it emits evidence-oriented payloads keyed around literal identity.

The current shapes are:

{"lit_id": <int>} for literal-bound rule arguments
catalog entries shaped like {"lit_id", "evidence_refs", "confidence", "derivation"} inside the synthetic literal-catalog operation

Evidence references

Rules can point back to the evidence they were derived from. The currently emitted evidence_refs shapes include entries such as:

literal refs with kind="literal"
table refs with kind="table"
node refs with kind="node"

node_span is emitted separately for quick localization and currently carries:

nodes
edges
op_slot

Synthetic literal catalog

When literal-table evidence exists, the receipt may emit a synthetic operation with:

slot = -1
name = "literal-catalog"

Annotations

annotations is required today, but it is not part of the evidence hash boundary. The current emitted keys are:

preamble_words_full
decoded_literal_strings
literal_strings
literal_printable_runs
policy_status
literal_catalog_sources

The current policy_status values emitted by build_ir_from_decode() are:

none
literal_catalog
ops_mapped
filters_bound
rules_emitted

Consumer-side concepts such as probe_ready belong to integration/ir/probes.py and should not be documented here as if they were emitted SbplIR values.

Rendering Boundaries

The policy layer is renderable, but not all renderability means the same thing. PAWL has three rendering surfaces at different levels:

Semantic renderer. render_policy(...) works from the typed Policy DAG alone and produces compilable SBPL. That is the common case for policy-layer consumers. The four_point_loop() in decode_evidence.py prefers this renderer when a typed DAG has been built from blob + SBPL hints.

Roundtrip renderer. render_profile_sbpl(...) in the reverse lane has a stricter job: produce SBPL that can roundtrip back to a structurally equivalent blob. That path depends on metadata the typed DAG does not carry by itself, including fidelity controls such as baseline suppression, message-filter reconstruction, pool-order recovery, and other reverse-lane mechanics. This boundary is important because it explains both the strength and the limit of the policy layer: the DAG is sufficient for semantic rendering, but not a complete container for every blob-faithful reverse heuristic.

Receipt fallback renderer. render_sbpl_from_ir() in decode_evidence.py is a split surface: it prefers the typed Policy DAG renderer when a DAG is available, and otherwise falls back to a minimal receipt-based render. The four_point_loop() caller keeps the receipt fallback as an escape hatch when a typed-DAG render is not accepted by the compile lane. Today the receipt fallback emits (version 1), deterministic literal definitions when annotations.literal_strings supports them, and the default line from policy_receipt.default_decision. It does not yet render reconstructed per-operation rules back into full SBPL.

Validation Context

PAWL does not treat the IR as self-authenticating. The five-point harness validates it through roundtrip comparison, forcing source and roundtrip artifacts against one another across surface, structural, semantic, and, where available, behavioral checks. That validation context is what makes the IR durable under format churn: when something drifts, the project can localize whether the problem sits in structural extraction, policy interpretation, rendering, normalization, or runtime behavior. Runtime evidence is confirmatory rather than universal, but the harness keeps the IR from becoming a static schema with no external pressure.

Example

Consider a profile that allows reads under /tmp:

(allow file-read-data
    (subpath "/tmp"))

In the primary modern case, the evidence layer records this as:

op_table["file-read-data"] -> node index 5
node[5] -> test record, edge →6 / →3, payload ref 0
literal record[0] -> "/tmp"

The corresponding Policy IR sketch uses the real contract vocabulary:

policy = Policy(
    schema_version="...",
    sbpl_version=1,
    default_decision="deny",
    operations={
        "file-read-data": OperationDAG(
            op="file-read-data",
            entry="n5",
            nodes={
                "n5": NodeTest(
                    atom=FilterAtom(
                        name="subpath",
                        args=[ValueString("/tmp")],
                    ),
                    match="n6",
                    unmatch="n3",
                    provenance=NodeProvenance(origin_offsets=[5]),
                ),
                "n6": NodeAllow(),
                "n3": NodeDeny(),
            },
        ),
    },
)

A blob-backed IREnvelope would then bind this policy to coordinates and hashes, for example mapping file-read-data:n5 back to node index 5 and its byte offset while carrying the full profile_hash, section_hashes, and the file-read-data subgraph hash. In source-evaluate mode, the same policy can be assembled without blob-backed hashes, with absence_reason="source_evaluate_no_blob" instead of invented byte coordinates.

Owner Code

The living single-profile contract spans:

pawl/contract/policy_dag.py — typed Policy DAG
pawl/contract/values.py — Value union types
pawl/contract/envelope.py — IREnvelope, assemble_ir_envelope
pawl/contract/coordinates.py — CoordinateIndex, NODE_BYTE_SIZE
pawl/contract/compare.py — lane-neutral typed DAG comparison
pawl/contract/render.py — lane-neutral typed DAG renderer
integration/ir/decode_evidence.py — SbplIR producer, four-point loop
integration/ir/profile/ir_builder.py — pawl.profile.ir.v1 producer
integration/FIVE_POINT_HARNESS.md — consolidated five-point harness spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermediate Representation

The IR Stack

Evidence Layer

Policy Layer

Envelope, Coordinates, and Hash Boundaries

`SbplIR` Schema

`evidence.header`

`evidence.tables`

`evidence.op_table`

`evidence.nodes`

`evidence.literal_pool`

`evidence.opaque_regions`

Evidence Hash Boundary

Policy Receipt

Predicate argument payloads

Evidence references

Synthetic literal catalog

Annotations

Rendering Boundaries

Validation Context

Example

Owner Code

FilesExpand file tree

IR.md

Latest commit

History

IR.md

File metadata and controls

Intermediate Representation

The IR Stack

Evidence Layer

Policy Layer

Envelope, Coordinates, and Hash Boundaries

SbplIR Schema

evidence.header

evidence.tables

evidence.op_table

evidence.nodes

evidence.literal_pool

evidence.opaque_regions

Evidence Hash Boundary

Policy Receipt

Predicate argument payloads

Evidence references

Synthetic literal catalog

Annotations

Rendering Boundaries

Validation Context

Example

Owner Code

`SbplIR` Schema

`evidence.header`

`evidence.tables`

`evidence.op_table`

`evidence.nodes`

`evidence.literal_pool`

`evidence.opaque_regions`