This document describes PAWL's single-profile intermediate representation. The IR is two-layer in purpose but is carried through a top-level envelope: the evidence layer preserves byte-faithful structure, the policy layer interprets that structure as a typed decision DAG, and the envelope binds both to coordinates and hashes. PAWL treats durability as a validation property, not just a schema property: the IR is useful because it keeps unknowns explicit and because it survives roundtrip checking in the five-point harness. Bundle-level IR is a neighboring surface and is out of scope here.
PAWL has four distinct policy-facing shapes for single-profile work. They sit at different levels of interpretation and serve different consumers:
SbplIRfromintegration/ir/decode_evidence.py: evidence receipt from the decode-evidence lane, built directly from a compiled blob and its decoded records before higher-level classification exists- typed
PolicyDAG frompawl.contract.policy_dag: the semantic center, a typed decision graph shared across lanes pawl.profile.ir.v1fromintegration/ir/profile/ir_builder.py: comparison-oriented classification view for profile roundtrip workIREnvelopefrompawl/reverse/api.pypluspawl.contract.*: contract packaging of a typed policy plus coordinates and hashes
This stack exists because the information available at each stage is different.
SbplIR records what byte-level decoding can see before the mapping artifacts
and reverse engine exist. The typed Policy DAG is what compile-decode or
reverse can produce once header hints or full graph recovery are available.
pawl.profile.ir.v1 is the view that makes roundtrip comparison tractable by
knowing about slot classes, node graph structure, and op-table surfaces. And
IREnvelope is what the contract layer packages for durable downstream
consumption.
The mapping artifacts that connect SbplIR to higher layers are built by
build_op_slots and build_filter_arg_specs, which consume only
SbplIR.evidence. Those artifacts must exist before load_reverse_context can
produce pawl.profile.ir.v1.
The evidence layer records what a compiled profile looks like before we decide what it means. Its job is structural fidelity: section framing, op-table targets, node records, literal-oriented records, and any opaque corridors that must be preserved even when they are not yet fully interpreted.
For modern profiles, the primary shape is a header plus a profiles/op-table region, any pre-node tables, a node stream, and a literal-oriented region. That is the main case this document describes. But PAWL's evidence layer is format-variant-aware rather than hardcoded to one permanently fixed layout: modern profiles, message-filter profiles, and legacy shapes do not all expose the same clean framing. Section slicing is therefore treated as heuristic but bounded, and the evidence layer records framing witnesses instead of pretending the layout is solved once and for all.
This is also why the literal-oriented region should not be described as "just a pool of NUL-terminated UTF-8 strings." In the owner contract it is represented as deterministic records with offsets and classes, while printable string extraction remains an auxiliary heuristic rather than the evidence format itself. Likewise, unknown or opaque regions are preserved rather than guessed away. Two artifacts can still be compared structurally even when some regions remain unknown, because uncertainty is represented explicitly instead of being collapsed into omission.
The policy layer is the typed decision graph built on top of evidence. It is
not a generic "semantic graph" with informal node labels. A Policy maps
operation names to OperationDAG values, and each OperationDAG carries an
entry node plus a node map for that operation's reachable decision structure.
The node vocabulary is the contract vocabulary:
NodeTestNodeAllowNodeDenyNodeJumpOpNodeUnknownNodeUnknownInlinePolicyRef
EntryPoint is not a node kind in this model; operation entry is carried by
OperationDAG.entry. The explicit Unknown forms matter because they let PAWL
keep partial understanding inside the IR rather than forcing premature
reclassification or dropping unresolved structure on the floor.
Predicate tests are carried by FilterAtom(name, args, provenance), and the
arguments are typed values rather than loose payloads. The value vocabulary is:
ValueStringValueRegexValueBoolValueNumberValueSymbolValueParamRef
That type system is part of what makes the policy layer renderable back to SBPL in a disciplined way. A regex literal, a string literal, a boolean, and a parameter reference are not interchangeable payloads; they are different values with different renderings and different comparison behavior.
Provenance is attached directly to policy objects. Nodes carry
NodeProvenance, including origin_offsets, origin_spans, and optional
origin metadata, and FilterAtom can also carry provenance. This is how the
policy layer remains operationally tied to the underlying bytes and source
spans: a semantic claim is not just "what we think the policy means," but "what
we think it means, and where that claim came from."
At the contract level, PAWL packages single-profile IR in an IREnvelope
carrying world_id, policy_ir, evidence_ir, coordinates, and hashes.
The coordinate bridge matters because provenance is not always one-to-one.
CoordinateIndex maps policy nodes back to blob locations, and a
NodeCoordinate can carry multiple node indices and byte offsets because the
compiler can share nodes across operations. Missing blob coordinates are also
explicit: instead of inventing offsets, the contract uses absence_reason.
Hash boundaries are represented at three levels:
profile_hashfor the full compiled blobsection_hashesforop_table,node_stream, andliteral_poolsubgraph_hashesfor per-operation reachable decision subgraphs
These hash modes are not all available under the same conditions. In a
blob-backed path, PAWL can populate all three levels. In a source-evaluate
path, there is no compiled blob, so profile_hash and section_hashes are
null while policy-derived subgraph_hashes can still exist. In that same
source-only mode, coordinates can still be materialized with
absence_reason="source_evaluate_no_blob" so that downstream consumers see an
honest absence rather than a fake location.
The evidence hash boundary is intentionally narrow. Structural hashes are meant to move when structure moves, not when convenience annotations, previews, or other non-structural views improve. This is what lets PAWL distinguish structure drift from interpretation drift.
SbplIR is the evidence receipt emitted by
integration/ir/decode_evidence.py:build_ir_from_decode(). It is a JSON-like
object with four top-level fields:
format: fixed string"sbpl-ir@1"evidence: required structural evidencepolicy_receipt: required today, but only as an evidence-derived policy receipt (not the typed semantic policy object)annotations: required today, excluded from the evidence hash boundary
This records the parsed header plus slicing and framing witnesses:
format_variant: one of the ingestion variants such as"modern-struct-16","modern-heuristic", or"legacy-decision-tree"type_flags: raw profile type flags when availablenode_countop_table_widthvars_countstates_countregex_countentitlements_countnum_profilesinstructions_countnode_stride_bytesnodes_startnodes_endliteral_startliteral_endnodes_to_literal_deltaop_table_mode: currently"raw-index"or"tagged-byte-offset"when the decoder provides that sanity classification
preamble_words_full is not part of evidence.header; it lives under
annotations.
This is populated for modern-struct-16 profiles and is otherwise null.
When present, it records the pre-node tables as explicit arrays rather than burying them in padding:
profiles: list of profile records, each withname_offset_u16,reserved_u16, andop_tableentitlements_u16regex_u16vars_u16states_u16alignment_padding_len
For non-bundle blobs, profiles still appears as a one-element list.
This is ordered by slot index. Each entry preserves both the raw u16 and the
decoded target view:
slotraw_u16tag_bitstarget:node_indexbyte_offsetabsolute_offset
Nodes stay in blob order. Each node record is a byte-faithful evidence record with two structured helper subobjects:
indextagkindrecord_sizeu16raw_hexlayout:edge_indicespayload_indicesu16_rolelayout_provenance
annotations:filter_vocab_reffilter_kind_reffilter_arg_rawfilter_out_of_vocabliteral_refsliteral_refs_provenanceliteral_ref_matches
The nested annotations subobject is intentionally outside the evidence hash
boundary.
The literal region is represented as deterministic records, not as a bag of printable strings. The top-level fields are:
raw_sha256content_digestlayout_digestrecords
Each record has:
record_indexliteral_idrel_offsetlen_u16payload_sha256- optional
payload_preview_ascii class
The currently emitted class values are:
len_prefixedtrailer_bytesalign_padremainder
payload_preview_ascii is a convenience field and is excluded from the
evidence hash boundary.
This carries message-filter and opaque-corridor information when present:
message_filtermasked_blob_sha256opaque_region
For message-filter profiles this lets comparison distinguish whole-blob drift from drift outside the opaque corridor.
Evidence hashing is gated solely by sanitize_evidence_for_hash() in
integration/ir/decode_evidence.py. Treat edits to that filter as breaking
changes for evidence-level comparisons.
The current sanitizer strips:
payload_preview_asciinotespreamble_words_full- nested
annotations
The contract is:
- structural evidence lives under
evidence - convenience views and heuristics live under
annotationsor nestedannotationssubkeys - improving annotations must not silently move the structural hash boundary
The policy_receipt field on SbplIR is not the project-level typed Policy
DAG. It is the evidence-derived receipt emitted by
_build_policy_receipt_from_evidence() in
integration/ir/decode_evidence.py.
That receipt is not evidence-only in the strict sense: it also loads the op-slot and filter-binding mapping artifacts that let the decode-evidence lane name operations and attach literal-bound rule payloads. But it exists to record what the decode-evidence lane could name and bind without pretending to be the semantic policy center. It is useful for:
- mapping progress and partial-recovery status under
annotations.policy_status - localization and debugging through
evidence_refs,node_span, derivation tags, witness ids, and confidence labels - a minimal receipt-lane fallback render when no typed
PolicyDAG has been built alongside the receipt
Its shape is:
default_decision: currently"deny"operations: ordered list of reconstructed operations
Each operation record has:
slotnameconfidencederivationrules
Each derivation object currently carries:
tagmapping_artifact_sha256witness_case_ids
Each rule currently carries:
decisionpredicateconfidencederivationevidence_refsnode_span
Current emitted predicate objects are atom-shaped:
op: filter or synthetic predicate nameargs: list of evidence-oriented argument payloads
This is narrower than the typed predicate/value model in
pawl.contract.policy_dag and pawl.contract.values.
SbplIR.policy_receipt does not currently emit the typed ValueString /
ValueRegex / ValueParamRef union used by the project-level Policy DAG.
Instead it emits evidence-oriented payloads keyed around literal identity.
The current shapes are:
{"lit_id": <int>}for literal-bound rule arguments- catalog entries shaped like
{"lit_id", "evidence_refs", "confidence", "derivation"}inside the syntheticliteral-catalogoperation
Rules can point back to the evidence they were derived from. The currently
emitted evidence_refs shapes include entries such as:
- literal refs with
kind="literal" - table refs with
kind="table" - node refs with
kind="node"
node_span is emitted separately for quick localization and currently carries:
nodesedgesop_slot
When literal-table evidence exists, the receipt may emit a synthetic operation with:
slot = -1name = "literal-catalog"
annotations is required today, but it is not part of the evidence hash
boundary. The current emitted keys are:
preamble_words_fulldecoded_literal_stringsliteral_stringsliteral_printable_runspolicy_statusliteral_catalog_sources
The current policy_status values emitted by build_ir_from_decode() are:
noneliteral_catalogops_mappedfilters_boundrules_emitted
Consumer-side concepts such as probe_ready belong to integration/ir/probes.py
and should not be documented here as if they were emitted SbplIR values.
The policy layer is renderable, but not all renderability means the same thing. PAWL has three rendering surfaces at different levels:
Semantic renderer. render_policy(...) works from the typed Policy DAG
alone and produces compilable SBPL. That is the common case for policy-layer
consumers. The four_point_loop() in decode_evidence.py prefers this
renderer when a typed DAG has been built from blob + SBPL hints.
Roundtrip renderer. render_profile_sbpl(...) in the reverse lane has a
stricter job: produce SBPL that can roundtrip back to a structurally equivalent
blob. That path depends on metadata the typed DAG does not carry by itself,
including fidelity controls such as baseline suppression, message-filter
reconstruction, pool-order recovery, and other reverse-lane mechanics. This
boundary is important because it explains both the strength and the limit of
the policy layer: the DAG is sufficient for semantic rendering, but not a
complete container for every blob-faithful reverse heuristic.
Receipt fallback renderer. render_sbpl_from_ir() in decode_evidence.py
is a split surface: it prefers the typed Policy DAG renderer when a DAG is
available, and otherwise falls back to a minimal receipt-based render. The
four_point_loop() caller keeps the receipt fallback as an escape hatch when a
typed-DAG render is not accepted by the compile lane. Today the receipt
fallback emits (version 1), deterministic literal definitions when
annotations.literal_strings supports them, and the default line from
policy_receipt.default_decision. It does not yet render reconstructed
per-operation rules back into full SBPL.
PAWL does not treat the IR as self-authenticating. The five-point harness validates it through roundtrip comparison, forcing source and roundtrip artifacts against one another across surface, structural, semantic, and, where available, behavioral checks. That validation context is what makes the IR durable under format churn: when something drifts, the project can localize whether the problem sits in structural extraction, policy interpretation, rendering, normalization, or runtime behavior. Runtime evidence is confirmatory rather than universal, but the harness keeps the IR from becoming a static schema with no external pressure.
Consider a profile that allows reads under /tmp:
(allow file-read-data
(subpath "/tmp"))In the primary modern case, the evidence layer records this as:
op_table["file-read-data"] -> node index 5
node[5] -> test record, edge →6 / →3, payload ref 0
literal record[0] -> "/tmp"
The corresponding Policy IR sketch uses the real contract vocabulary:
policy = Policy(
schema_version="...",
sbpl_version=1,
default_decision="deny",
operations={
"file-read-data": OperationDAG(
op="file-read-data",
entry="n5",
nodes={
"n5": NodeTest(
atom=FilterAtom(
name="subpath",
args=[ValueString("/tmp")],
),
match="n6",
unmatch="n3",
provenance=NodeProvenance(origin_offsets=[5]),
),
"n6": NodeAllow(),
"n3": NodeDeny(),
},
),
},
)A blob-backed IREnvelope would then bind this policy to coordinates and
hashes, for example mapping file-read-data:n5 back to node index 5 and its
byte offset while carrying the full profile_hash, section_hashes, and the
file-read-data subgraph hash. In source-evaluate mode, the same policy can be
assembled without blob-backed hashes, with
absence_reason="source_evaluate_no_blob" instead of invented byte
coordinates.
The living single-profile contract spans:
pawl/contract/policy_dag.py— typedPolicyDAGpawl/contract/values.py—Valueunion typespawl/contract/envelope.py—IREnvelope,assemble_ir_envelopepawl/contract/coordinates.py—CoordinateIndex,NODE_BYTE_SIZEpawl/contract/compare.py— lane-neutral typed DAG comparisonpawl/contract/render.py— lane-neutral typed DAG rendererintegration/ir/decode_evidence.py—SbplIRproducer, four-point loopintegration/ir/profile/ir_builder.py—pawl.profile.ir.v1producerintegration/FIVE_POINT_HARNESS.md— consolidated five-point harness spec