@@ -24,6 +24,7 @@ iterations.
2424## Build and Tooling
2525
2626- ` cargo build -p iscc-jni ` must run before ` mvn test ` (native library prerequisite)
27+ - Maven POM is at ` crates/iscc-jni/java/pom.xml ` — run ` mvn test ` from ` crates/iscc-jni/java/ `
2728- CI workflow at ` .github/workflows/ci.yml ` has 9 jobs: version-check, rust, python, nodejs, wasm,
2829 c-ffi, java, go, bench. The ` bench ` job runs ` cargo bench --no-run ` (compile-only, no execution)
2930- ` version-check ` job: lightweight (checkout + setup-python only), runs
@@ -40,66 +41,57 @@ iterations.
4041- wasm-opt release flags: ` [package.metadata.wasm-pack.profile.release] ` with
4142 ` wasm-opt = ["-O", "--enable-bulk-memory", "--enable-nontrapping-float-to-int"] `
4243
43- ## Go Pure Go Rewrite
44-
45- - Pure Go codec: ` packages/go/codec.go ` — type enums (` MainType ` , ` SubType ` , ` Version ` with ` iota ` ),
46- varnibble header encoding/decoding, base32/base64, ` EncodeComponent ` , ` IsccDecompose ` ,
47- ` IsccDecode ` . Zero external dependencies
48- - Go type naming: ` MTMeta ` ..` MTFlake ` , ` STNone ` ..` STWide ` , ` STText = STNone ` , ` VSV0 Version = 0 `
49- - Internal helpers are unexported (lowercase): ` encodeHeader ` , ` decodeHeader ` , etc.
50- - ` IsccDecode ` uses ` DecodeResult ` struct defined in ` codec.go `
51- - Base32: ` base32.StdEncoding.WithPadding(base32.NoPadding) ` . Base64: ` base64.RawURLEncoding `
52- - Pure Go text utils: ` TextClean ` (NFKC + control-char + empty-line collapse), ` TextCollapse ` (NFD +
53- lowercase + filter C/M/P + NFKC), ` TextTrim ` (UTF-8 byte-boundary), ` TextRemoveNewlines `
54- (strings.Fields join). Uses ` golang.org/x/text/unicode/norm `
55- - CDC: ` cdcGear ` table is ` var ` not ` const ` (Go no const arrays). ` min() ` builtin Go 1.21+
56- - MinHash: ` minhashFn ` naming (avoids conflict). ` maxi64 ` /` mprime ` /` maxH ` are ` var ` not ` const `
57- - SimHash: ` AlgSimhash ` returns ` ([]byte, error) ` , ` SlidingWindow ` returns ` ([]string, error) ` . Uses
58- ` []rune ` for Unicode-correct SlidingWindow
59- - CDC integer ceiling: ` (minSize + 1) / 2 ` (Go has no div_ceil method)
60- - DCT: ` algDct ` (unexported) + ` dctRecursive ` helper. Only uses ` math ` stdlib. Nayuki recursive
61- divide-and-conquer. Input must be power of 2 — checked via ` n > 0 && n&(n-1) == 0 `
62- - WTA-Hash: ` AlgWtahash ` (exported) + ` wtaVideoIdPermutations ` ` [256][2]int ` table. No external deps
63- - Gen functions: ` code_content_text.go ` (GenTextCodeV0 + softHashTextV0), ` code_meta.go `
64- (GenMetaCodeV0 + metaNameSimhash + softHashMetaV0 + softHashMetaV0WithBytes + interleaveDigests
65- \+ slidingWindowBytes + decodeDataURL + parseMetaJSON + jsonHasContext + buildMetaDataURL +
66- multiHashBlake3), ` code_data.go ` (GenDataCodeV0 + DataHasher with Push/Finalize),
67- ` code_instance.go ` (GenInstanceCodeV0 + InstanceHasher with Push/Finalize),
68- ` code_content_image.go ` (GenImageCodeV0 + softHashImageV0 + transposeMatrix + flatten8x8 +
69- computeMedian), ` code_content_audio.go ` (GenAudioCodeV0 + softHashAudioV0 + arraySplit[ T] ).
70- Result types: ` TextCodeResult ` , ` MetaCodeResult ` , ` DataCodeResult ` , ` InstanceCodeResult ` ,
71- ` ImageCodeResult ` , ` AudioCodeResult ` , ` VideoCodeResult ` , ` MixedCodeResult ` , ` IsccCodeResult `
72- - xxh32: ` xxh32.go ` — standalone xxHash32 impl (~ 80 lines). Used by softHashTextV0 for n-gram
73- feature hashing. Unexported: ` xxh32(data, seed) ` , ` xxh32Round ` , ` rotl32 ` , ` readU32LE `
74- - JCS canonicalization: uses Go stdlib ` json.Marshal ` (sorts keys, compact format). Works for
75- string/null values in conformance vectors. For full RFC 8785 float compliance, would need a
76- dedicated library
77- - BLAKE3 dependency: ` github.com/zeebo/blake3 ` (SIMD-optimized). ` blake3.Sum256(data) ` returns
78- ` [32]byte `
79- - Test naming for gen functions: ` TestPureGo* ` prefix (historical — could be renamed to ` Test* ` in
80- future cleanup)
81- - Go docs: ` packages/go/README.md ` and ` docs/howto/go.md ` describe pure Go API (no WASM/wazero).
82- Examples use ` iscc.Function(...) ` pattern with typed result structs (` *MetaCodeResult ` , etc.)
83- - Image-Code helpers: ` transposeMatrix ` , ` flatten8x8 ` , ` computeMedian ` are unexported in
84- ` code_content_image.go ` . ` bitsToBytes ` reused from ` codec.go `
85- - Audio-Code: ` arraySplit[T any] ` is generic (Go 1.18+), used for splitting digests into quarters/
86- thirds. ` AlgSimhash ` on 4-byte digests returns 4 bytes (output = input digest length)
87- - ` sort.Slice ` for int32: ` func(i, j int) bool { return s[i] < s[j] } ` (no built-in int32 sort)
88- - Video-Code: ` SoftHashVideoV0 ` exported (matching Rust ` pub fn ` ). Dedup via
89- ` fmt.Sprintf("%v", sig) ` string keys in ` map[string][]int32 ` . Column-wise int64 sums →
90- ` AlgWtahash `
91- - Mixed-Code: ` softHashCodesV0 ` unexported (matching Rust non-pub). Preserves first header byte for
92- type info in SimHash entries. Uses ` decodeHeader ` /` decodeLength ` to validate Content MainType
93- and bit length. ` AlgSimhash ` error safely discarded (all entries identical length)
94- - Go module dependencies: ` github.com/zeebo/blake3 ` (BLAKE3, SIMD), ` golang.org/x/text ` (Unicode).
95- No wazero or WASM dependencies. ` github.com/klauspost/cpuid/v2 ` indirect (blake3 SIMD detection)
96- - Test naming: ` TestCodec* ` , ` TestUtils* ` , ` TestCdc* ` , ` TestMinhash* ` , ` TestSimhash* ` ,
97- ` TestAlgDct* ` , ` TestAlgWtahash* ` , ` TestPermutation* `
98- - Conformance tests (per-function): ` os.ReadFile("../../crates/iscc-lib/tests/data.json") `
99- - Conformance selftest: ` //go:embed testdata/data.json ` in conformance.go.
100- ` ConformanceSelftest() (bool, error) ` — package-level function (no receiver). Uses
101- ` vectorEntry ` struct + 9 ` run*Tests ` section runners. ` decodeStream ` shared helper for
102- Data/Instance hex decoding
44+ ## Go Pure Go Rewrite (Summary)
45+
46+ - Pure Go in ` packages/go/ ` — all 10 gen functions + codec + algorithms. Zero WASM deps
47+ - Dependencies: ` github.com/zeebo/blake3 ` , ` golang.org/x/text ` . Indirect: ` cpuid/v2 `
48+ - Go idioms: unexported helpers (lowercase), ` var ` for arrays/large uint64 (Go const limitations),
49+ ` []rune ` for Unicode SlidingWindow, generics for ` arraySplit[T] `
50+ - Conformance: ` //go:embed testdata/data.json ` , per-function tests use
51+ ` os.ReadFile("../../crates/iscc-lib/tests/data.json") `
52+ - 151 Go tests total. CI: 4 steps (checkout, setup-go, test, vet) — no Rust deps
53+
54+ ## gen_sum_code_v0
55+
56+ - ` gen_sum_code_v0(path: &Path, bits: u32, wide: bool) -> IsccResult<SumCodeResult> ` in ` lib.rs `
57+ - Single-pass file I/O: opens file, reads in ` IO_READ_SIZE ` chunks, feeds both ` DataHasher ` and
58+ ` InstanceHasher ` , composes ISCC-CODE via ` gen_iscc_code_v0 `
59+ - ` SumCodeResult { iscc, datahash, filesize } ` in ` types.rs ` — same ` #[non_exhaustive] ` pattern
60+ - File I/O errors mapped to ` IsccError::InvalidInput("Cannot open/read file: {e}") `
61+ - ` units: Vec<String> ` field deferred (not in scope for initial core implementation)
62+ - 32nd and final Tier 1 symbol for Rust core — all 32 symbols now implemented
63+ - Python binding: PyO3 wrapper in ` crates/iscc-py/src/lib.rs ` accepts ` &str ` path, ` SumCodeResult `
64+ class in ` __init__.py ` , public wrapper accepts ` str | os.PathLike ` via ` os.fspath() ` , 6 tests in
65+ ` tests/test_smoke.py `
66+ - Node.js binding: ` NapiSumCodeResult ` struct (` #[napi(object)] ` ) + ` gen_sum_code_v0 ` napi fn in
67+ ` crates/iscc-napi/src/lib.rs ` . Uses ` i64 ` for ` filesize ` (napi-rs no u64 support). 6 tests in
68+ ` __tests__/functions.test.mjs `
69+ - WASM binding: ` WasmSumCodeResult ` struct (` #[wasm_bindgen(getter_with_clone)] ` ) +
70+ ` gen_sum_code_v0 ` fn in ` crates/iscc-wasm/src/lib.rs ` . Accepts ` &[u8] ` (no filesystem in WASM).
71+ Uses ` f64 ` for ` filesize ` (wasm-bindgen ` u64 ` maps to ` BigInt ` , awkward for JS). Composes
72+ internally via ` DataHasher ` + ` InstanceHasher ` + ` gen_iscc_code_v0 ` . 6 tests in ` tests/unit.rs ` ,
73+ 75 total WASM tests (9 conformance + 66 unit; 1 behind ` conformance ` feature gate)
74+ - C FFI binding: ` IsccSumCodeResult ` repr(C) struct with ` ok: bool ` , ` iscc: *mut c_char ` ,
75+ ` datahash: *mut c_char ` , ` filesize: u64 ` . ` iscc_gen_sum_code_v0(path, bits, wide) ` extern "C"
76+ function + ` iscc_free_sum_code_result ` free function in ` crates/iscc-ffi/src/lib.rs ` . Follows
77+ ` IsccDecodeResult ` struct-return pattern. 4 Rust tests + 3 C tests. 82 total Rust tests, 57
78+ total C test assertions
79+ - JNI binding: ` SumCodeResult.java ` (immutable, ` String iscc ` , ` String datahash ` , ` long filesize ` )
80+ - ` Java_io_iscc_iscc_1lib_IsccLib_genSumCodeV0 ` in ` crates/iscc-jni/src/lib.rs ` . Returns ` jobject `
81+ via ` env.find_class("io/iscc/iscc_lib/SumCodeResult") ` + ` env.new_object() ` with signature
82+ ` (Ljava/lang/String;Ljava/lang/String;J)V ` . 4 Maven tests. 62 total Maven tests
83+ - Go binding: ` packages/go/code_sum.go ` — ` SumCodeResult ` struct (` Iscc ` , ` Datahash ` , ` Filesize ` ) +
84+ ` GenSumCodeV0(path string, bits uint32, wide bool) ` . Single-pass file I/O with ` os.Open ` +
85+ ` DataHasher ` + ` InstanceHasher ` + ` GenIsccCodeV0 ` . 4 tests in ` code_sum_test.go ` . 151 total Go
86+ tests. ALL 7 bindings complete for issue #15
87+
88+ ## Benchmarks
89+
90+ - ` crates/iscc-lib/benches/benchmarks.rs ` — all 10 ` gen_*_v0 ` + DataHasher streaming + CDC chunks
91+ - ` bench_sum_code ` uses ` tempfile::NamedTempFile ` since ` gen_sum_code_v0 ` takes ` &Path ` (not
92+ ` &[u8] ` )
93+ - Temp files created outside bench closure (setup cost excluded from measurement)
94+ - ` tempfile ` is a dev-dependency only (workspace dep ` tempfile = "3" ` )
10395
10496## Codec Internals
10597
@@ -148,9 +140,37 @@ iterations.
148140
149141- All 4 Reference pages complete: Rust API, Python API, C FFI, Java API
150142
143+ ## Binding Constant Export Patterns
144+
145+ - NAPI: ` #[napi(js_name = "CONST_NAME")] pub const CONST_NAME: u32 = iscc_lib::CONST_NAME as u32; `
146+ - WASM: ` #[wasm_bindgen(js_name = "CONST_NAME")] pub fn const_name() -> u32 { ... } ` (getter fn, not
147+ const — wasm-bindgen limitation)
148+ - C FFI: ` #[unsafe(no_mangle)] pub extern "C" fn iscc_const_name() -> u32 { ... } ` + inline
149+ ` #[test] ` in same file. cbindgen auto-generates the C header
150+ - NAPI JS tests: ` describe('CONST_NAME', () => { it('equals X'); it('is a number'); }) `
151+ - WASM tests: ` #[wasm_bindgen_test] ` in ` tests/unit.rs ` (requires wasm-pack to run)
152+ - C tests: ` ASSERT_EQ(iscc_const_name(), value, "label") ` in ` tests/test_iscc.c `
153+ - 5 constants currently exported: META_TRIM_NAME, META_TRIM_DESCRIPTION, META_TRIM_META,
154+ IO_READ_SIZE, TEXT_NGRAM_SIZE
155+
156+ ## Documentation Sweep Patterns
157+
158+ - "N gen" count references exist in: READMEs (9 files), docs/ (14 files), howto/ (6 files), crate
159+ CLAUDE.md files (5), notes/ (2), source comments (.rs, .py, .mjs, .pyi), benchmarks/ (2)
160+ - The Edit tool requires a full Read call (not offset/limit) before the first edit per file
161+ - mdformat auto-reformats after edits — always run ` mise run format ` twice after doc changes
162+ - iscc-core-ts is external and may have different function counts than iscc-lib
163+
151164## Gotchas
152165
153166- JNI package underscore encoding: ` iscc_lib ` → ` iscc_1lib ` in function names
154167- mdformat auto-formats markdown — keep backtick expressions short to avoid wrapping crashes
155168- ` from __future__ import annotations ` in ` __init__.py ` — use ` | ` union syntax, not ` Union `
156- - Python ` __all__ ` has 45 entries (30 API + 10 result types + ` __version__ ` + MT, ST, VS, core_opts)
169+ - Python ` __all__ ` has 48 entries (32 API + 11 result types + ` __version__ ` + MT, ST, VS, core_opts)
170+ - ` gen_sum_code_v0 ` wide mode only differs from normal when ` bits >= 128 ` (wide requires 128-bit+
171+ codes)
172+ - After adding new symbols to ` crates/iscc-py/src/lib.rs ` , MUST rebuild the ` .so ` with
173+ ` uv run maturin develop -m crates/iscc-py/Cargo.toml ` before ` pytest ` will work
174+ - JSON ` {"x":""} ` overhead is 8 bytes (not 7) — relevant for boundary tests on META_TRIM_META
175+ - META_TRIM_META validation: pre-decode check uses ` META_TRIM_META * 4/3 + 256 ` (base64 inflation +
176+ media type header), post-decode check uses ` META_TRIM_META ` directly
0 commit comments