Skip to content

Commit 1f01f92

Browse files
authored
Merge pull request #9 from iscc/develop
CID loop improvements and iscc-sdk compatibility
2 parents 6ea6da9 + 4c40c40 commit 1f01f92

File tree

40 files changed

+4134
-364
lines changed

40 files changed

+4134
-364
lines changed

.claude/agent-memory/advance/MEMORY.md

Lines changed: 131 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,17 @@ iterations.
5353
- JNI Java-side Javadoc (`IsccLib.java`) still says `@throws IllegalArgumentException` for hasher
5454
update/finalize methods — the Rust side throws `IllegalStateException` but Java declarations are
5555
cosmetically mismatched (tests verify correct runtime behavior)
56+
- JNI `isccDecode` returns `jobject` (not `jstring`): construct Java object via `env.find_class` +
57+
`env.new_object` with constructor signature `(IIII[B)V`. The `JValue::Object` takes a reference
58+
to `JByteArray` (which derefs to `JObject`). Class path uses `/` separators
59+
- JNI `encodeComponent` validates jint ranges (0-255 for mtype/stype/version, ≥0 for bitLength)
60+
before casting to u8/u32, using `throw_and_default` for out-of-range values
61+
- JNI constants are `public static final int` in Java (no JNI function needed — compile-time
62+
literals). Placed at top of `IsccLib.java` before static initializer block
63+
- `IsccDecodeResult.java`: separate file in same package (`io.iscc.iscc_lib`), public final fields,
64+
single constructor `(int, int, int, int, byte[])`. Auto-compiled by Maven (no pom.xml changes)
65+
- JNI `extern "system"` count verification: `grep -c 'extern "system"'` returns N+1 because of doc
66+
comment on line 3 mentioning the string. Actual function count = grep result - 1
5667

5768
## WASM/WASI
5869

@@ -66,6 +77,10 @@ iterations.
6677
Cargo.toml changes needed. The cdylib target produces the `.wasm` file
6778
- `iscc_alloc`/`iscc_dealloc` are the WASM host memory management pair — host allocates via
6879
`iscc_alloc`, writes data, calls FFI functions, then frees via `iscc_dealloc`
80+
- WASM binary in `packages/go/iscc_ffi.wasm` must be rebuilt and copied whenever new FFI functions
81+
are added to `crates/iscc-ffi/src/lib.rs`. Build:
82+
`cargo build -p iscc-ffi --target wasm32-wasip1 --release`
83+
`cp target/wasm32-wasip1/release/iscc_ffi.wasm packages/go/`
6984
- Debug WASM binary is ~10.5MB; release + wasm-opt reduces significantly
7085
- wasm-opt release config in `crates/iscc-wasm/Cargo.toml`:
7186
`[package.metadata.wasm-pack.profile.release]` with
@@ -109,10 +124,17 @@ iterations.
109124
reads a null-terminated array of u32 pointers from WASM32 memory (4 bytes each, little-endian),
110125
calls `readString` for each non-zero pointer, then `iscc_free_string_array` to free the entire
111126
array. Pattern mirrors `callStringResult` for single strings
112-
- Go Runtime has 45 methods total: 24 public (Close, ConformanceSelftest, TextClean,
127+
- Go Runtime has 48 methods total: 27 public (Close, ConformanceSelftest, TextClean,
113128
TextRemoveNewlines, TextCollapse, TextTrim, EncodeBase64, SlidingWindow, IsccDecompose,
114129
AlgSimhash, AlgMinhash256, AlgCdcChunks, SoftHashVideoV0, 9 gen\_\*\_v0, NewDataHasher,
115-
NewInstanceHasher) + 21 private helpers
130+
NewInstanceHasher, JsonToDataUrl, EncodeComponent, IsccDecode) + 21 private helpers
131+
- Go `DecodeResult` struct: public struct with `Maintype`, `Subtype`, `Version`, `Length` (all
132+
`uint8`) and `Digest` (`[]byte`). Returned as `*DecodeResult` (pointer) from `IsccDecode`
133+
- Go `IsccDecode` uses sret ABI: 16-byte `IsccDecodeResult` struct. Layout: ok(1B) + maintype(1B) +
134+
subtype(1B) + version(1B) + length(1B) + padding(3B) + digest.data(4B) + digest.len(4B).
135+
`iscc_free_decode_result` takes sret pointer (single i32 param) on wasm32
136+
- Go constants: `MetaTrimName`, `MetaTrimDescription`, `IoReadSize`, `TextNgramSize` are
137+
package-level `const` (idiomatic Go). No enum types — use plain `int`/`uint8`
116138
- Go streaming hasher pattern: `DataHasher`/`InstanceHasher` structs hold `rt *Runtime` +
117139
`ptr uint32` (opaque WASM pointer). Factory methods on Runtime call `iscc_*_hasher_new()` and
118140
check for NULL. `Update` writes bytes via `writeBytes`, calls `iscc_*_hasher_update` (returns
@@ -206,8 +228,13 @@ iterations.
206228
- zensical.toml nav: How-to Guides order is Rust → Python → Node.js → WebAssembly → Go → Java
207229
- Go and Java guides include algorithm primitives section (SlidingWindow, AlgMinhash256,
208230
AlgCdcChunks, AlgSimhash) not present in Python/Node.js guides
209-
- All 6 how-to guides complete: Rust (356 lines), Python (353), Node.js (281), WASM (338), Go (388),
210-
Java (321)
231+
- All 6 how-to guides have Codec operations + Constants sections. Python, Node.js, WASM, Go all have
232+
them; Java has them too. Python uniquely documents `core_opts` SimpleNamespace and IntEnum
233+
return types from `iscc_decode`. WASM constants are exported as uppercase getter functions
234+
(`META_TRIM_NAME()` etc.) via `js_name` attributes. Node.js Codec section uses `require()` style
235+
imports per next.md spec
236+
- All 6 how-to guides complete: Rust (356 lines), Python (~420), Node.js (~350), WASM (~410), Go
237+
(463), Java (~390)
211238
- `docs/architecture.md` and `docs/development.md` include all 6 binding crates (Python, Node.js,
212239
WASM, C FFI, JNI, Go) in diagrams, layout trees, and tables. Go uses dotted arrow (`-.->`) in
213240
Mermaid to indicate indirect WASM dependency via `iscc-ffi`
@@ -221,6 +248,90 @@ iterations.
221248
npm package names against `docs/index.md` and `crates/*/README.md` — the wasm-pack howto
222249
originally had `@iscc/iscc-wasm` (wrong)
223250

251+
## Node.js Binding — Tier 1 Propagation
252+
253+
- napi-rs `#[napi]` on `pub const` works directly (no getter function fallback needed). `usize` to
254+
`u32` cast is safe for all 4 algorithm constants (all fit within u32 range)
255+
- `IsccDecodeResult` uses `#[napi(object)]` struct with named fields (`maintype`, `subtype`,
256+
`version`, `length`, `digest`) — JavaScript has no tuples, so return an object instead
257+
- `iscc_decode` napi wrapper destructures the Rust tuple `(u8, u8, u8, u8, Vec<u8>)` into
258+
`IsccDecodeResult` struct fields, converting `Vec<u8>` to `Buffer` via `.into()`
259+
- napi-rs `#[napi(js_name = "...")]` on constants uses the original SCREAMING_SNAKE_CASE name to
260+
prevent napi-rs auto-conversion to camelCase
261+
- Total Node.js test count after 7 new symbols: 124 (103 existing + 21 new across 7 describe blocks)
262+
263+
## WASM Binding — Tier 1 Propagation
264+
265+
- wasm-bindgen does NOT support `#[wasm_bindgen]` on `pub const` — use getter functions with
266+
`#[wasm_bindgen(js_name = "SCREAMING_CASE")]` instead. Safe `as u32` cast (all values fit)
267+
- `IsccDecodeResult` WASM struct uses `#[wasm_bindgen(getter_with_clone)]` because `Vec<u8>` is not
268+
`Copy`. The `digest` field maps to `Uint8Array` in JS
269+
- wasm-bindgen accepts `&str` and `&[u8]` directly (like PyO3, unlike napi-rs which needs owned
270+
`String`/`Buffer`). No `.as_deref()` or `.as_ref()` conversion needed for these types
271+
- Total WASM test count after 7 new symbols: 59 unit + 1 conformance_selftest (with feature) + 9
272+
conformance, from 40 unit previously
273+
- `#[wasm_bindgen` annotation count in lib.rs: 35 (was 25, +10 for 7 functions + 2 impl blocks + 1
274+
struct)
275+
276+
## C FFI Binding — Tier 1 Propagation
277+
278+
- Constants exposed as `extern "C"` getter functions (not `pub static` — avoids cbindgen `usize` → C
279+
type mapping issues). All are infallible (no error handling, no `clear_last_error`)
280+
- `iscc_json_to_data_url` follows the standard string-in/string-out pattern (same as
281+
`iscc_text_clean`)
282+
- `iscc_encode_component` takes raw `*const u8` + `usize` for digest, with the standard null-check +
283+
`from_raw_parts` pattern from `iscc_gen_data_code_v0`
284+
- `IsccDecodeResult` is `#[repr(C)]` struct with `ok: bool` discriminant,
285+
`maintype/subtype/version/ length: u8`, and `digest: IsccByteBuffer`. Reuses existing
286+
`IsccByteBuffer` and helpers (`null_byte_buffer`, `vec_to_byte_buffer`)
287+
- `iscc_free_decode_result` delegates to `iscc_free_byte_buffer` for digest cleanup
288+
- `ptr_to_str` in FFI crate takes `param_name: &str` arg for error messages (not just `ptr` like
289+
next.md pseudocode suggested) — all new functions use this pattern
290+
- Length index for 64-bit codes is 1 (not 0): `decode_length` uses `(length_index + 1) * 32` for
291+
standard MainTypes. Index 0 = 32-bit, index 1 = 64-bit
292+
- Generated `iscc.h` header is NOT committed — CI generates it dynamically via `cbindgen`
293+
- Total `#[unsafe(no_mangle)]` count after propagation: 44 (was 35, +9: 4 constants +
294+
json_to_data_url
295+
- encode_component + iscc_decode + iscc_free_decode_result + the existing ones)
296+
- Total Rust unit tests: 77 (62 existing + 15 new). Total C test assertions: 49 (30 existing + 19
297+
new)
298+
299+
## Tier 1 API Surface
300+
301+
- Algorithm constants (`META_TRIM_NAME`, `META_TRIM_DESCRIPTION`, `IO_READ_SIZE`, `TEXT_NGRAM_SIZE`)
302+
are `pub const` at crate root in `lib.rs`, placed after `pub use` re-exports
303+
- Tier 1 `encode_component` wrapper in `lib.rs` takes `u8` for enum fields, validates with
304+
`TryFrom<u8>`, adds explicit digest length check (`digest.len() < bit_length / 8`), then
305+
delegates to `codec::encode_component`. No naming conflict because `codec::encode_component` is
306+
NOT re-exported at crate root
307+
- Magic numbers 128, 4096, 13 in gen functions replaced with constants `META_TRIM_NAME`,
308+
`META_TRIM_DESCRIPTION`, `TEXT_NGRAM_SIZE` respectively
309+
- `IO_READ_SIZE` uses spec value 4_194_304 (4 MB), not Python reference value 2_097_152 (2 MB)
310+
- `iscc_decode` Tier 1 wrapper in `lib.rs` takes `&str`, returns `(u8, u8, u8, u8, Vec<u8>)`
311+
strips "ISCC:" prefix and dashes, delegates to `codec::decode_base32``codec::decode_header`
312+
`codec::decode_length`, truncates tail to exact digest bytes. Unlike Python ref which returns
313+
full tail, our API returns usable digest directly
314+
- `json_to_data_url` in `lib.rs` combines `parse_meta_json` + `build_meta_data_url` private helpers
315+
into one public API. Defined directly in `lib.rs` (not in a submodule), so no `pub use`
316+
re-export needed. Deps: `serde_json`, `serde_json_canonicalizer`, `data_encoding` — all already
317+
present. Output differs from conformance vector test_0016 in two ways: no `charset=utf-8`
318+
parameter, and payload is JCS-canonical (spaces removed)
319+
320+
## Python Binding — Tier 1 Propagation
321+
322+
- PyO3 `iscc_decode` wrapper needs `py: Python<'_>` param to wrap `Vec<u8>` in `PyBytes::new()`.
323+
Returns `PyObject` using `.into_pyobject(py)?.into()` for the tuple
324+
- PyO3 constants registered with `m.add("NAME", value)?` in module init (not `wrap_pyfunction!`)
325+
- Python `__init__.py` `__all__` had 34 entries before Tier 1 propagation, not 35 as estimated. The
326+
count after adding 7 new symbols is 41 (34 + 7)
327+
- Constants and simple functions (encode_component, iscc_decode, json_to_data_url) are direct
328+
re-exports in `__init__.py` — no wrapper logic needed (unlike gen_data_code_v0 which adds
329+
streaming)
330+
- Type stubs (`_lowlevel.pyi`) place constants at top (before function stubs), with inline
331+
docstrings. Constants use `int` type annotation
332+
- `uv run maturin develop -m crates/iscc-py/Cargo.toml` works; bare `maturin develop` fails (command
333+
not found in devcontainer PATH — needs `uv run` prefix)
334+
224335
## Codec Internals
225336

226337
- `decode_header` and `decode_varnibble_from_bytes` operate directly on `&[u8]` with bitwise
@@ -290,3 +401,19 @@ iterations.
290401
Place it after `from __future__ import annotations` and `from importlib.metadata import version`
291402
- When `maturin develop` installs a version, it persists in the venv — if the workspace version
292403
changes in Cargo.toml, must rebuild with `maturin develop` to sync the installed version
404+
- Dict meta pattern in `gen_meta_code_v0` Python wrapper: `import json as _json` (underscore alias
405+
to avoid namespace pollution), `isinstance(meta, dict)`
406+
`_json.dumps(meta, separators=(",", ":"), ensure_ascii=False)``json_to_data_url()`. The
407+
Rust `json_to_data_url` handles JCS canonicalization internally, so the Python side only needs
408+
compact JSON serialization
409+
- PIL pixel data pattern in `gen_image_code_v0` Python wrapper: widen signature to
410+
`bytes | bytearray | memoryview | Sequence[int]`, use
411+
`if not isinstance(pixels, bytes): pixels = bytes(pixels)`. The `bytes()` constructor handles
412+
bytearray, memoryview, and Sequence[int] (including PIL's ImagingCore from `Image.getdata()`)
413+
uniformly. No Rust changes needed — conversion is Python-wrapper-only. This same pattern applies
414+
to any future function that accepts `&[u8]` in Rust but needs wider input types in Python
415+
- Python IntEnum classes (`MT`, `ST`, `VS`) in `__init__.py`: pure Python, no Rust dependency. `ST`
416+
has `TEXT = 0` alias for `NONE` (IntEnum allows duplicate values as aliases — first definition
417+
wins). `iscc_decode` wrapper converts raw integers to IntEnum types. `core_opts` is a
418+
`SimpleNamespace` mapping attribute names to existing constants. Total `__all__` entries: 45 (41
419+
\+ MT, ST, VS, core_opts)

0 commit comments

Comments
 (0)