hyperpolymath · hyperpolymath · May 21, 2026 · May 21, 2026
diff --git a/PROOF-NEEDS.md b/PROOF-NEEDS.md
@@ -25,11 +25,11 @@ Classification key: **(J)** genuinely unavoidable, documented as an axiom;
 
 | # | Site | Function | Type | Class | Rationale |
 |---|------|----------|------|-------|-----------|
-| 1 | `SafetyLemmas.idr:53` | `charEqSound` | `(c1,c2:Char) -> c1 == c2 = True -> c1 = c2` | **J** | `Char` is an opaque primitive; `==` is `prim__eqChar` (foreign `Bool`). Idris2 0.8.0 has no in-language soundness principle for primitive equality. Standard, well-understood axiom. |
-| 2 | `SafetyLemmas.idr:60` | `charEqSym` | `(x,y:Char) -> (x == y) = (y == x)` | **J** | Symmetry of `prim__eqChar`. Same reason as #1 — opaque primitive, no decision procedure to recurse on. |
-| 3 | `SafetyLemmas.idr:211` | `unpackLength` | `length (unpack s) = length s` | **J** | `unpack` = `prim__strToCharList` (foreign). `String` is opaque with no induction principle; the relation between primitive `String` length and `List Char` length is not reducible in-language. |
-| 4 | `SafetyLemmas.idr:219` | `appendLengthSum` | `length (s ++ t) = length s + length t` | **J** | `++` on `String` = `prim__strAppend` (foreign). Length additivity is a backend-semantics guarantee, not type-level reducible. |
-| 5 | `SafetyLemmas.idr:226` | `substrLengthBound` | `LTE (length (substr start len s)) len` | **J** | `substr` = `prim__strSubstr` (foreign). The "result no longer than `len`" bound is a primitive-semantics guarantee with no in-language proof. |
+| 1 | `SafetyLemmas.idr:53` | `charEqSound` | `(c1,c2:Char) -> c1 == c2 = True -> c1 = c2` | **J ✓** | `Char` is an opaque primitive; `==` is `prim__eqChar` (foreign `Bool`). Idris2 0.8.0 has no in-language soundness principle for primitive equality. Standard, well-understood axiom. **Externally validated** via backend-assurance harness (`docs/backend-assurance/prim__eqChar.md` + BEAM property test). |
+| 2 | `SafetyLemmas.idr:60` | `charEqSym` | `(x,y:Char) -> (x == y) = (y == x)` | **J ✓** | Symmetry of `prim__eqChar`. Same reason as #1 — opaque primitive, no decision procedure to recurse on. **Externally validated** (same harness as #1). |
+| 3 | `SafetyLemmas.idr:211` | `unpackLength` | `length (unpack s) = length s` | **J** | `unpack` = `prim__strToCharList` (foreign). `String` is opaque with no induction principle; the relation between primitive `String` length and `List Char` length is not reducible in-language. _Backend-assurance harness pending._ |
+| 4 | `SafetyLemmas.idr:219` | `appendLengthSum` | `length (s ++ t) = length s + length t` | **J ✓** | `++` on `String` = `prim__strAppend` (foreign). Length additivity is a backend-semantics guarantee, not type-level reducible. **Externally validated** via backend-assurance harness (`docs/backend-assurance/prim__strAppend.md` + BEAM property test). |
+| 5 | `SafetyLemmas.idr:226` | `substrLengthBound` | `LTE (length (substr start len s)) len` | **J ✓** | `substr` = `prim__strSubstr` (foreign). The "result no longer than `len`" bound is a primitive-semantics guarantee with no in-language proof. **Externally validated** via backend-assurance harness (`docs/backend-assurance/prim__strSubstr.md` + BEAM property test). |
 
 **Verdict: 5/5 are class (J).** All five reduce to the same root cause:
 Idris2 treats `Char` and `String` as opaque primitive types whose
@@ -39,6 +39,15 @@ principle. There is no constructive in-language proof for any of them; a
 short of changing the trusted computing base. They are correctly marked
 `%unsafe`, individually documented, and isolated in one module.
 
+The **J ✓** marker indicates the axiom is class (J) **and**
+externally validated by the backend-assurance harness (see
+`docs/backend-assurance/`). The validation does not change the
+in-language proof — the `believe_me` sites stay in source. It shrinks
+the trusted base from "we trust the backend" to "we have read the
+backend lowering and randomly tested the operation against the
+claimed property". A bare **J** indicates a class-(J) axiom whose
+harness has not yet landed.
+
 No **(R)** or **(S)** sites were found. The audit's "9" was a raw text
 count conflating 5 real axioms with 4 comment mentions.
 
@@ -74,11 +83,11 @@ All five are class **(J)** — genuinely unavoidable in Idris2 0.8.0
 
 | Axiom | Site | Justification | Backend-assurance evidence |
 |-------|------|---------------|----------------------------|
-| `charEqSound` | `SafetyLemmas.idr:53` | Soundness of `prim__eqChar` — backend primitive correctness | `docs/backend-assurance/prim__eqChar.md` + `elixir/test/backend_assurance/prim_eq_char_test.exs` |
-| `charEqSym` | `SafetyLemmas.idr:60` | Symmetry of `prim__eqChar` — backend primitive correctness | `docs/backend-assurance/prim__eqChar.md` + `elixir/test/backend_assurance/prim_eq_char_test.exs` |
+| `charEqSound` | `SafetyLemmas.idr:53` | Soundness of `prim__eqChar` — backend primitive correctness, externally validated | `docs/backend-assurance/prim__eqChar.md` + `elixir/test/backend_assurance/prim_eq_char_test.exs` |
+| `charEqSym` | `SafetyLemmas.idr:60` | Symmetry of `prim__eqChar` — backend primitive correctness, externally validated | `docs/backend-assurance/prim__eqChar.md` + `elixir/test/backend_assurance/prim_eq_char_test.exs` |
 | `unpackLength` | `SafetyLemmas.idr:211` | `prim__strToCharList` preserves length — backend primitive correctness | _pending_ |
-| `appendLengthSum` | `SafetyLemmas.idr:219` | `prim__strAppend` length semantics — not reducible at type level | _pending_ |
-| `substrLengthBound` | `SafetyLemmas.idr:226` | `prim__strSubstr` length bound — not reducible at type level | _pending_ |
+| `appendLengthSum` | `SafetyLemmas.idr:219` | `prim__strAppend` length semantics — not reducible at type level, externally validated | `docs/backend-assurance/prim__strAppend.md` + `elixir/test/backend_assurance/prim_str_append_test.exs` |
+| `substrLengthBound` | `SafetyLemmas.idr:226` | `prim__strSubstr` length bound — not reducible at type level, externally validated | `docs/backend-assurance/prim__strSubstr.md` + `elixir/test/backend_assurance/prim_str_substr_test.exs` |
 
 Note: `logSafeBounded` in SafeAPIKey.idr no longer uses `believe_me` directly;
 it calls the documented SafetyLemmas axioms via structural proof.
@@ -105,10 +114,11 @@ sites stay in `SafetyLemmas.idr`. The harness shrinks the trusted base
 from "we trust the backend" to "we read the lowering and randomly
 tested the operation".
 
-First primitive landed: `prim__eqChar` (covering both `charEqSound`
-and `charEqSym`). Remaining three (`prim__strToCharList`,
-`prim__strAppend`, `prim__strSubstr`) tracked under epic #87 Tier C
-backend-assurance campaign — one PR per primitive.
+Primitives validated so far: `prim__eqChar` (covering `charEqSound`
+and `charEqSym`), `prim__strAppend` (covering `appendLengthSum`),
+`prim__strSubstr` (covering `substrLengthBound`). Remaining:
+`prim__strToCharList` (covering `unpackLength`). Tracked under epic
+#87 Tier C backend-assurance campaign.
 
 ## Priority Going Forward
 

diff --git a/docs/backend-assurance/README.md b/docs/backend-assurance/README.md
@@ -28,8 +28,8 @@ companion property-test harness lives under
 |--------------------|--------------------------------------|----------------------------------|----------------------------------------------------------------|
 | `prim__eqChar`     | `charEqSound`, `charEqSym`           | `prim__eqChar.md`                | `elixir/test/backend_assurance/prim_eq_char_test.exs`          |
 | `prim__strToCharList` | `unpackLength`                    | _pending_                        | _pending_                                                      |
-| `prim__strAppend`  | `appendLengthSum`                    | _pending_                        | _pending_                                                      |
-| `prim__strSubstr`  | `substrLengthBound`                  | _pending_                        | _pending_                                                      |
+| `prim__strAppend`  | `appendLengthSum`                    | `prim__strAppend.md`             | `elixir/test/backend_assurance/prim_str_append_test.exs`       |
+| `prim__strSubstr`  | `substrLengthBound`                  | `prim__strSubstr.md`             | `elixir/test/backend_assurance/prim_str_substr_test.exs`       |
 
 Each row is delivered as one PR per primitive. `prim__eqChar` is the
 first; the other three follow the same shape.

diff --git a/docs/backend-assurance/prim__strAppend.md b/docs/backend-assurance/prim__strAppend.md
@@ -0,0 +1,132 @@
+# Backend-Assurance: `prim__strAppend`
+
+Trusted-extraction validation for the class-(J) axiom over Idris2's
+`prim__strAppend` primitive:
+
+- `appendLengthSum : (s, t : String) -> length (s ++ t) = length s + length t`
+  (`src/abi/Boj/SafetyLemmas.idr:226`)
+
+Declared `%unsafe` with `believe_me ()` in Idris2 0.8.0 because
+`String` is an opaque primitive type with no constructors and no
+in-language induction principle. This document argues — by inspecting
+the backend lowerings that BoJ actually ships against — that the
+length-additivity property holds.
+
+The companion property test
+(`elixir/test/backend_assurance/prim_str_append_test.exs`) exercises
+the BEAM half of this argument over the codepoint space.
+
+## What `prim__strAppend` is
+
+`prim__strAppend : String -> String -> String` is a primitive
+arithmetic operation declared in Idris2's `Core.Primitives`. The
+`++` operator on `String` is the `Semigroup String` instance, which
+is `prim__strAppend` directly. Length is `prim__strLength` (the
+`length : String -> Nat` definition in `Data.String`).
+
+The question reduces to: does the operation `prim__strAppend` followed
+by `prim__strLength` satisfy `|prim__strAppend(s, t)| = |s| + |t|` on
+each shipping backend? Both `length` and `++` agree on a single notion
+of "character count" — what that notion is depends on the backend.
+
+## Chez Scheme backend (Idris2 default codegen)
+
+In `Compiler.Scheme.Chez`, `prim__strAppend` lowers to the R6RS
+procedure `string-append`, and `prim__strLength` lowers to
+`string-length`. On Chez 9.x:
+
+- **String model.** R6RS §6.7 specifies that a Scheme string is a
+  sequence of Unicode characters (codepoints), not a byte sequence.
+  `string-length` returns the number of characters.
+- **`string-append` semantics.** R6RS §11.12 specifies `string-append`
+  returns a newly-allocated string whose characters are the
+  concatenation, in order, of the characters of the argument strings.
+- **Length additivity.** Combining the two: the number of characters
+  in `(string-append s t)` equals the number of characters in `s`
+  plus the number of characters in `t`. This is part of the Scheme
+  standard, not an implementation detail.
+
+No further evidence needed beyond citing the standard.
+
+## BEAM backend (Erlang / Elixir, where BoJ runs)
+
+BoJ's REST surface is Elixir on the BEAM. The Idris2 proofs are
+compile-time-only artefacts; the runtime strings flowing through the
+system are BEAM UTF-8 binaries. On BEAM:
+
+- **String model.** Elixir strings are UTF-8 encoded binaries.
+  `String.length/1` returns the **codepoint count** (not the byte
+  count) by scanning the UTF-8 boundaries. This matches Idris2's
+  `length`-on-`String` semantics on Chez (codepoint count).
+- **Concatenation lowering.** Elixir's `<>` on binaries is the
+  built-in `bif erlang:'++'/2` for iolists, ultimately compiling to
+  a byte-level binary append. UTF-8 is prefix-free: appending two
+  valid UTF-8 byte sequences yields a valid UTF-8 byte sequence whose
+  codepoint boundaries are exactly the boundaries of the operands.
+- **Length additivity.** Because UTF-8 is prefix-free, the codepoint
+  boundaries of `s <> t` are exactly the boundaries of `s` followed
+  by the boundaries of `t`. Hence
+  `String.length(s <> t) == String.length(s) + String.length(t)`.
+
+The property test exercises this over random strings sampled from the
+legal codepoint range (excluding surrogates), plus explicit boundary
+strings spanning all four UTF-8 encoding widths (1/2/3/4 bytes) and
+the empty-string identity case.
+
+## Why this isn't circular
+
+The harness does not call `prim__strAppend`. It calls Elixir `<>` and
+`String.length/1` directly on UTF-8 binaries. The argument is: *the
+operation that Idris2 lowers `prim__strAppend` + `prim__strLength` to
+on the BEAM is `<>` + `String.length/1` on UTF-8 binaries*, so
+demonstrating those operations satisfy the property is sufficient.
+The trusted-extraction step is reading the lowering; the
+property-test step is verifying the operation behaves as the lowering
+claims.
+
+For Chez, we do not run a Scheme harness — R6RS is sufficient
+documentary evidence. If BoJ ever ships a backend whose string model
+is not Unicode codepoints (e.g. a byte-oriented C backend without
+UTF-8 awareness), this document gets a new section and a matching
+property test, **and the axiom may need to be restated in terms of
+byte length** — see the *Honest framing* clause in
+`docs/backend-assurance/README.md`.
+
+## Edge cases considered
+
+- **Empty strings.** `s <> "" = s` and `"" <> s = s` are tested
+  explicitly. Length additivity reduces to `|s| + 0 = |s|` and
+  `0 + |s| = |s|` respectively — corner cases of the main property
+  but worth pinning to catch a backend that allocates a sentinel byte
+  on empty append.
+- **Multi-byte codepoints.** Tested via boundary strings covering
+  all four UTF-8 widths: ASCII (1 byte), Latin-1 supplement (2 bytes,
+  e.g. `café`), CJK (3 bytes, e.g. `日本語`), and astral plane
+  (4 bytes, e.g. `🦀`). Each width has a different number of bytes
+  per codepoint, but the codepoint count is invariant.
+- **Surrogates** (`0xD800..0xDFFF`): excluded from the codepoint
+  generator. These are illegal as standalone codepoints in
+  well-formed Unicode; their presence would indicate a system-under-
+  test bug, not a `prim__strAppend` failure.
+- **Normalisation.** Out of scope. `prim__strAppend` is byte-level
+  (per UTF-8 prefix-free property) and does not compose canonical
+  decompositions. A grapheme that is `e` + combining acute (two
+  codepoints) appended to nothing remains two codepoints, regardless
+  of whether the precomposed `é` (one codepoint) would canonically
+  equal it.
+- **Three-way associativity.** Not in the axiom but cheap to assert.
+  Length of `(s <> t) <> u` equals length of `s <> (t <> u)` equals
+  `|s| + |t| + |u|`. Catches a backend whose concatenation
+  associates differently on length than on byte order.
+
+## References
+
+- Idris2 0.8.0 `src/Core/Primitives.idr` — primitive operation table.
+- Idris2 0.8.0 `src/Compiler/Scheme/Chez.idr` — Chez codegen lowerings
+  for `prim__strAppend` and `prim__strLength`.
+- R6RS §6.7, §11.12 — Scheme string model and `string-append`
+  specification.
+- Elixir `String` module documentation — UTF-8 codepoint counting via
+  `String.length/1`.
+- `PROOF-NEEDS.md` — axiom audit (2026-05-18) and class-(J) framing.
+- `src/abi/Boj/SafetyLemmas.idr` — axiom declaration (line 226).