Specification version: 2.0.0
Schema identifier: aiir/commit_receipt.v2
Status: Stable
Date: 2026-03-12
Author: Invariant Systems, Inc.
License: Apache-2.0
This document is the normative specification for the AIIR commit receipt
format (aiir/commit_receipt.v2). It defines the receipt structure, field
semantics, canonical JSON encoding, content-addressing algorithm, verification
procedure, and extension mechanism.
An independent implementation that follows this specification MUST produce
identical content_hash and receipt_id values for the same input data, and
MUST accept or reject receipts identically to the reference implementation.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119.
The reference implementation is the aiir Python package:
- Repository: https://github.com/invariant-systems-ai/aiir
- PyPI: https://pypi.org/project/aiir/
- License: Apache-2.0
A JSON Schema (draft 2020-12) is provided at:
schemas/commit_receipt.v2.schema.json(current)schemas/commit_receipt.v1.schema.json(still accepted)
A CDDL grammar (RFC 8610) covering both the JSON and CBOR wire formats is provided at:
The CDDL grammar is normative for the CBOR canonical envelope and informative for the JSON encoding (which is fully specified by §6).
A receipt is a JSON object with the following top-level fields:
| Field | Type | In CORE | Description |
|---|---|---|---|
type |
string (const) | ✅ | "aiir.commit_receipt" |
schema |
string (const) | ✅ | "aiir/commit_receipt.v2" |
version |
string | ✅ | SemVer version of the generating tool |
commit |
object | ✅ | Git commit metadata (§3) |
ai_attestation |
object | ✅ | AI authorship detection results (§4) |
provenance |
object | ✅ | Tool and repository metadata (§5) |
receipt_id |
string | ❌ | Content-addressed identifier (§7) |
content_hash |
string | ❌ | SHA-256 of canonical core (§7) |
timestamp |
string | ❌ | RFC 3339 UTC generation time |
extensions |
object | ❌ | Extension point (§8) |
The core of a receipt is the subset of fields that contribute to the content hash. The core keys are exactly:
CORE_KEYS = {"type", "schema", "version", "commit", "ai_attestation", "provenance"}
These six keys — and ONLY these six — form the hash input. All other fields
(receipt_id, content_hash, timestamp, extensions) are derived or
non-content and MUST NOT affect the hash.
Forward-compatibility: New top-level fields added in future schema versions MUST NOT be included in CORE_KEYS unless the schema identifier changes (e.g.,
aiir/commit_receipt.v3). The CORE_KEYS set is an explicit allowlist, not a denylist.
The commit field is a JSON object with the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
sha |
string | ✅ | Full hex SHA of the git commit (40 or 64 chars) |
tree_sha |
string | ✅ | Full hex SHA of the commit's tree object (40 or 64 chars). Binds the receipt to the directory state. |
parent_shas |
string[] | ✅ | Ordered parent commit SHAs. Empty for root commits. Binds the receipt to the DAG position. |
author |
GitIdentity | ✅ | Author identity |
committer |
GitIdentity | ✅ | Committer identity |
subject |
string | ✅ | First line of commit message |
message_hash |
string | ✅ | "sha256:" + SHA-256(message_body) |
diff_hash |
string | ✅ | "sha256:" + SHA-256(full_diff) |
files_changed |
integer | ✅ | Number of files changed |
files |
string[] | Conditional | File paths (max 100). Present when not redacted. |
files_redacted |
true |
Conditional | Present when --redact-files active. Mutually exclusive with files. |
files_capped |
true |
Optional | Present when file list was truncated at 100. |
{
"name": "string",
"email": "string",
"date": "string (RFC 3339 or git default format)"
}All three fields are REQUIRED. No additional properties are permitted.
-
message_hash:
"sha256:" + hex(SHA-256(body.encode("utf-8")))wherebodyis the full commit message body (everything after the subject line, including trailers). -
diff_hash:
"sha256:" + hex(SHA-256(diff_bytes))wherediff_bytesis the raw output ofgit diff <parent> <sha>with flags:--no-ext-diff(prevent custom diff drivers)--no-textconv(prevent text conversion filters)--no-optional-locks(prevent lock contention)
For root commits (no parent), the diff is computed against the empty tree.
| Field | Type | Required | Description |
|---|---|---|---|
is_ai_authored |
boolean | ✅ | True if any AI signal detected |
signals_detected |
string[] | ✅ | List of detected AI signals |
signal_count |
integer | ✅ | len(signals_detected) |
is_bot_authored |
boolean | Optional | True if bot pattern matched |
bot_signals_detected |
string[] | Optional | Bot signal list |
bot_signal_count |
integer | Optional | len(bot_signals_detected) |
authorship_class |
string | Optional | One of: "human", "ai_assisted", "bot", "ai+bot" |
detection_method |
string | ✅ | Algorithm identifier (e.g., "heuristic_v2") |
The detection_method field identifies the algorithm used. The reference
implementation uses "heuristic_v2", which performs:
- NFKC normalization of all text fields
- Confusable character resolution (Cyrillic/Greek → Latin)
- Unicode category Cf (format) and Mn/Me (mark) stripping
- Case-insensitive pattern matching against known AI tool signatures
- Bot author/committer pattern matching
Third-party implementations MAY use different detection methods and SHOULD
identify them with a distinct detection_method value.
Non-normative. Heuristic detection relies on declared markers (
Co-authored-bytrailers, bot author patterns, commit-message keywords). Agent-mode tools — Copilot Chat, Claude Code, ChatGPT Canvas, Cursor Agent — do not currently emit these markers, so commits produced in agent sessions will classify ashuman. This is a known limitation of any metadata-heuristic approach. See README § Detection scope and THREAT_MODEL.md § 9.2 for discussion.
| Class | Criteria |
|---|---|
human |
No AI or bot signals detected |
ai_assisted |
AI signals detected, human author |
bot |
Bot signals detected, no AI signals |
ai+bot |
AI signals detected AND bot author |
Note:
ai_generatedappeared in schema versions ≤ 1.0.13 but was never emitted by the reference implementation. It is accepted by the validator for backward compatibility but SHOULD NOT be produced. Useai+botfor commits with both AI and bot signals.
| Field | Type | Required | Description |
|---|---|---|---|
repository |
string | null | ✅ | Git remote URL (credentials stripped). Null if no remote. |
tool |
string | ✅ | URI identifier: "https://github.com/invariant-systems-ai/aiir@{version}" |
generator |
string | ✅ | Generator ID: "aiir.cli", "aiir.action", "aiir.mcp", or custom |
The repository field MUST have all credentials removed:
- Userinfo component of the URL (e.g.,
https://token@github.com/...→https://github.com/...) - Query parameters (may contain tokens)
- URL fragments
Implementations MUST strip credentials before including the URL in the receipt.
The canonical JSON encoding is the sole deterministic serialization used for content addressing. All implementations MUST produce byte-identical output for the same input object.
canonical_json(obj) → UTF-8 string
- Serialize
objto JSON with:- Sorted keys: All object keys are sorted lexicographically (by Unicode code point)
- No whitespace: No spaces after
:or,separators - ASCII-safe: All non-ASCII characters are escaped as
\uXXXX - No NaN/Infinity:
NaNandInfinityare rejected (not valid JSON per RFC 8259)
- The separators MUST be exactly
(",", ":") - Key sorting MUST be recursive (all nested objects are also sorted)
In Python's json module:
json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=True, allow_nan=False)Implementations MUST reject JSON structures exceeding 64 levels of nesting to prevent stack overflow attacks. The depth check SHOULD be iterative (stack-based), not recursive.
Input:
{"b": 1, "a": {"d": 2, "c": 3}}Canonical output:
{"a":{"c":3,"d":2},"b":1}
The canonical JSON encoding for aiir/commit_receipt.v2 is fully defined
by the following parameters:
| Parameter | Value | Effect |
|---|---|---|
| Key ordering | lexicographic by Unicode code point | Matches RFC 8785 (JCS) §3.2.3 |
| Separators | (",", ":") |
No structural whitespace |
| ASCII safety | all non-ASCII escaped as \uXXXX |
Eliminates Unicode normalization divergence across runtimes |
| NaN / Infinity | rejected | Per RFC 8259 §6 |
| Integer format | decimal, no leading zeros, no exponent | Matches JCS §3.2.2.1 |
| Depth limit | 64 levels | Stack-overflow prevention |
Conformance note for SDK authors:
The current receipt schema uses ONLY the following JSON value types:
strings, integers, booleans, null, arrays of strings, and nested objects.
No floating-point values appear in CORE_KEYS. For these types, the
algorithm above produces output that is byte-identical to RFC 8785
(JCS). This equivalence is verified by the regression test suite
(tests/test_canonicalization.py).
The one area where this algorithm diverges from JCS is IEEE-754
floating-point serialization (Python json.dumps uses repr()-style
formatting; JCS mandates ECMAScript Number::toString rules). This
divergence is irrelevant today because the schema contains no floats.
Migration trigger: If a future schema version introduces a floating-point field in CORE_KEYS (e.g., a confidence score), this algorithm MUST be replaced with a full RFC 8785 implementation to preserve cross-language determinism. The
ensure_ascii=Trueflag SHOULD be retained during the transition to avoid breaking existing verifiers, with a documented migration window for the switch to literal UTF-8 output.
Implementations that handle ONLY the types listed above (strings, integers, booleans, null, arrays, objects) MAY use this simplified algorithm. Implementations that aim for full JCS compliance SHOULD use a validated RFC 8785 library and MUST produce byte-identical output for the current receipt schema.
content_hash = "sha256:" + hex(SHA-256(canonical_json(core)))
Where core is the receipt object filtered to only CORE_KEYS:
core = {k: v for k, v in receipt.items() if k in CORE_KEYS}The SHA-256 is computed over the UTF-8 encoding of the canonical JSON string.
receipt_id = "g1-" + hex(SHA-256(canonical_json(core)))[:32]
The g1- prefix denotes generation 1 of the ID scheme. Future ID
schemes (e.g., using different hash functions or encoding) would use
g2-, g3-, etc.
The receipt ID is a truncated hash (128 bits / 16 bytes). It is
sufficient for uniqueness within practical receipt sets but MUST NOT be
used alone for tamper detection — use content_hash for integrity
verification.
Given identical CORE_KEYS values, any conforming implementation MUST produce
the same content_hash and receipt_id. This is the fundamental
interoperability guarantee.
The extensions field is a JSON object that MAY contain arbitrary keys.
It is NOT part of CORE_KEYS and does NOT affect content_hash or
receipt_id.
⚠️ Normative: Extension data is annotation-only metadata. Extensions (includingagent_attestation,instance_id, etc.) are NOT covered by the content-addressing scheme. A receipt'scontent_hashandreceipt_idremain unchanged if extension fields are added, modified, or removed.If extension data is policy-significant (e.g., agent identity, model name, or context window), the receipt MUST be wrapped in a signed envelope (Sigstore per §11, or in-toto Statement per §14.2) to bind extension content cryptographically. Implementations MUST NOT treat unsigned extension fields as audit-grade evidence.
Known extension keys:
| Key | Type | Description |
|---|---|---|
instance_id |
string | User-supplied instance identifier |
namespace |
string | Organizational grouping label |
agent_attestation |
object | Agent identity, model, and context metadata (annotation-only; see warning above) |
Implementations SHOULD preserve unknown extension keys when round-tripping receipts.
To verify a receipt, an implementation MUST:
-
Schema validation: Verify the receipt is a JSON object containing all required fields with correct types (see JSON Schema).
-
Type check: Verify
type == "aiir.commit_receipt". -
Schema version check: Verify
schemastarts with"aiir/". -
Version format check: Verify
versionmatches^[0-9]+\.[0-9]+\.[0-9]+([.+\-][0-9a-zA-Z.+\-]*)?$(SemVer). -
Core extraction: Extract
core = {k: v for k, v in receipt.items() if k in CORE_KEYS}. -
Canonical encoding: Compute
core_json = canonical_json(core). -
Hash computation: Compute:
expected_hash = "sha256:" + hex(SHA-256(core_json.encode("utf-8")))
expected_id = "g1-" + hex(SHA-256(core_json.encode("utf-8")))[:32]
-
Constant-time comparison: Compare
expected_hashagainstreceipt["content_hash"]andexpected_idagainstreceipt["receipt_id"]using constant-time comparison (e.g.,hmac.compare_digest). -
Result: The receipt is valid if and only if BOTH comparisons succeed.
On failure, implementations SHOULD report which check failed:
"content hash mismatch"— core fields were modified after generation"receipt_id mismatch"— receipt ID was modified"unknown receipt type"— not an AIIR receipt"unknown schema"— unrecognized schema version"invalid version format"— version string contains prohibited characters
-
No expected hash on failure: On verification failure, implementations MUST NOT expose the expected hash values. Doing so would be a forgery oracle — an attacker could extract the correct hash and splice it in.
-
Constant-time comparison: String comparison MUST use constant-time algorithms to prevent timing side-channel attacks.
-
Depth limit: The canonical JSON encoder MUST enforce a depth limit (64 levels) to prevent stack overflow from crafted receipts.
When verifying a receipt from a file, implementations MUST:
- Reject symlinks (prevent filesystem probing)
- Reject files larger than 50 MB (prevent memory exhaustion)
- Parse as UTF-8 JSON
- Handle both single receipt (JSON object) and batch (JSON array)
- Cap array length at 1000 entries (prevent quadratic DoS)
Receipts MAY be cryptographically signed using Sigstore keyless signing. Signing adds:
- Non-repudiation (proves who generated the receipt)
- Tamper-proof integrity (beyond self-attested content addressing)
- Transparency log entry (public audit trail via Rekor)
Without signing, receipts are tamper-evident (modification is detectable) but not tamper-proof (a new valid receipt can be fabricated by anyone who can run the tool on the same commit).
Signing is RECOMMENDED for compliance-critical workflows.
Receipts exist at one of three trust tiers. Consumers MUST understand which tier applies before making policy decisions:
| Tier | Configuration | Integrity | Authenticity | Suitable for |
|---|---|---|---|---|
| Tier 1 — Unsigned | sign: false |
Tamper-evident (content hash) | None — anyone who can run aiir on the same commit can produce an identical receipt |
Developer convenience, local audit trails, smoke testing |
| Tier 2 — Signed | sign: true (default in GitHub Action) |
Tamper-proof (Sigstore bundle) | Signer identity bound via OIDC + transparency log | CI/CD compliance, SOC 2 evidence, audit-grade provenance |
| Tier 3 — Enveloped | in-toto Statement wrapper (§14.2) | Tamper-proof + envelope integrity | Predicate + subject bound to signer | Supply-chain attestation, SLSA provenance, cross-system verification |
Extensions at each tier:
- Tier 1: Extensions are mutable annotation. They MAY change without
affecting
receipt_id. Do NOT use for policy decisions. - Tier 2: The Sigstore signature covers the entire receipt JSON (including extensions). Modification of any field — core or extension — invalidates the signature.
- Tier 3: The in-toto envelope binds the full predicate (receipt) to a subject and signer. Extensions are cryptographically bound.
Guideline: For regulatory compliance (EU AI Act, SOC 2, ISO 27001), use Tier 2 or Tier 3. Tier 1 receipts are useful for development workflows but SHOULD NOT be cited as tamper-proof evidence.
The AIIR tool operates across multiple security surfaces with different threat models:
| Surface | Read restriction | Write restriction | Rationale |
|---|---|---|---|
CLI --verify |
None (user-explicit path) | N/A (read-only) | User typed the path. No oracle risk. |
CLI --output |
N/A | CWD boundary enforced | Prevent writes outside project. |
MCP aiir_verify |
CWD boundary enforced | N/A (read-only) | AI assistant could be tricked into using verify as a filesystem oracle (F4-02). |
MCP aiir_generate |
N/A | CWD boundary enforced | Prevent AI-directed writes outside project. |
| GitHub Action | Runner workspace | Runner workspace | Sandboxed by Actions runtime. |
See THREAT_MODEL.md for the full STRIDE analysis (153 controls).
Machine-readable test vectors are provided at:
schemas/test_vectors.json— verification vectors (valid/invalid receipts)schemas/test-vectors/encoder_interop_vectors.json— encoder interop vectors (canonical JSON, content hash, receipt ID)tests/adversarial/— adversarial corpus (32 fixtures across injection, tampering, parsing, bypass categories)
Each verification test vector includes:
- A complete receipt JSON
- Expected verification result (
valid: true/false) - Expected error messages (if invalid)
- A human-readable description
Each encoder interop vector includes:
- Input core fields (
input_core) - Expected canonical JSON byte string (
canonical_json) - Expected
content_hashandreceipt_id - Optional CBOR hex encoding (
cbor_hex) - A full receipt for round-trip verification
Conforming implementations MUST pass all verification and encoder interop test vectors.
The following media type registration is prepared per RFC 6838 §3.2 (Vendor Tree):
| Field | Value |
|---|---|
| Type name | application |
| Subtype name | vnd.aiir.commit-receipt+json |
| Required parameters | None |
| Optional parameters | schema — the receipt schema identifier (e.g., aiir/commit_receipt.v2). When absent, consumers SHOULD inspect the schema field in the JSON body. |
| Encoding considerations | 8bit. The content is UTF-8 encoded JSON per RFC 8259. |
| Security considerations | See §12 and THREAT_MODEL.md. Receipts contain git commit metadata including author names, email addresses, and commit subjects. Consumers MUST validate receipts using the verification algorithm (§9) before trusting content. Receipt fields may contain user-controlled text — display layers MUST sanitize to prevent injection (XSS, terminal escape, log injection). The content_hash field MUST be verified via constant-time comparison (§9.2) to prevent timing side-channel attacks. |
| Interoperability considerations | All conforming implementations MUST produce identical content_hash and receipt_id values for the same input data (§7.3). Interoperability is verified via published test vectors (§13). Two independent implementations (Python, TypeScript) currently pass all vectors. |
| Published specification | This document (SPEC.md). Machine-readable schema: commit_receipt.v2.schema.json. |
| Applications which use this media type | AIIR CLI, AIIR GitHub Action, AIIR MCP Server, CI/CD pipelines, compliance audit systems, supply-chain attestation frameworks (via in-toto envelope). |
| Fragment identifier considerations | None. |
| Restrictions on usage | None. |
| Additional information | File extension: .json, .jsonl (when stored as append-only ledger). Macintosh file type code: None. Deprecated alias names: None. Magic number: The first bytes of a receipt will match {"type":"aiir.commit_receipt" after whitespace normalization. |
| Person & email address to contact | Noah — noah@invariantsystems.io |
| Intended usage | COMMON |
| Change controller | Invariant Systems, Inc. |
The in-toto predicate type URI is:
https://invariantsystems.io/predicates/aiir/commit_receipt/v1
This URI identifies AIIR commit receipt predicates within in-toto Statement v1 envelopes. It resolves to a human-readable description of the predicate format with links to the specification and schema.
The JSON Schema $id URI is:
https://invariantsystems.io/schemas/aiir/commit_receipt.v2.schema.json
This URI resolves to the machine-readable JSON Schema (draft 2020-12)
for the aiir/commit_receipt.v2 receipt format.
This specification is maintained by Invariant Systems, Inc. The
canonical source is the SPEC.md file in the
aiir repository.
The specification uses Semantic Versioning (SemVer):
- Patch (1.0.x): Clarifications, typo fixes, additional test vectors. No behavioral changes. Existing implementations remain conformant.
- Minor (1.x.0): Backward-compatible additions — new optional fields, new extension points, additional security surfaces. Existing receipts remain valid. Existing implementations remain conformant for the fields they support.
- Major (x.0.0): Breaking changes — new CORE_KEYS, changed canonical
JSON rules, new schema identifier (e.g.,
aiir/commit_receipt.v3). Existing implementations MUST update to remain conformant. The previous schema version will be documented as deprecated but MUST continue to verify correctly for receipts generated under that version.
- Proposal: Open a GitHub issue or pull request against SPEC.md with a clear description of the proposed change and its rationale.
- Discussion: Changes are discussed publicly on the pull request. Security-sensitive changes may be discussed privately per SECURITY.md and disclosed when fixed.
- Reference implementation: All spec changes MUST ship with a corresponding update to the reference implementation and test vectors. A spec change without tests will not be merged.
- Release: Spec version bumps are released alongside the reference implementation version that implements them.
- A receipt generated by
aiir/commit_receipt.v1oraiir/commit_receipt.v2MUST verify correctly against any future implementation of the same major version. - The CORE_KEYS set (
type,schema,version,commit,ai_attestation,provenance) MUST NOT change within a major version. - The canonical JSON algorithm MUST NOT change within a major version.
- Test vectors from previous spec versions MUST continue to pass.
Third-party implementations claiming AIIR compatibility SHOULD:
- Reference the spec version they conform to
- Pass all published test vectors for that version
- Follow the badge usage guidelines in TRADEMARK.md
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-03-09 | Initial specification extracted from reference implementation |
| 1.0.1 | 2026-03-09 | §14: IANA media type registration template, predicate type URI, schema URI resolution |
| 1.1.0 | 2026-03-10 | §6.5: Normative canonicalization contract — explicit type inventory, RFC 8785 equivalence statement, migration trigger for floats |