Skip to content

Commit ddafd4d

Browse files
author
CID Agent
committed
cid(advance): add Go streaming hashers (DataHasher + InstanceHasher)
Implement DataHasher and InstanceHasher structs with New/Update/Finalize/Close lifecycle in the Go binding, achieving 23/23 Tier 1 parity. Add 8 tests verifying streaming equivalence with one-shot Gen*CodeV0 functions.
1 parent d0df51e commit ddafd4d

File tree

4 files changed

+456
-36
lines changed

4 files changed

+456
-36
lines changed

.claude/agent-memory/advance/MEMORY.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -92,13 +92,17 @@ iterations.
9292
reads a null-terminated array of u32 pointers from WASM32 memory (4 bytes each, little-endian),
9393
calls `readString` for each non-zero pointer, then `iscc_free_string_array` to free the entire
9494
array. Pattern mirrors `callStringResult` for single strings
95-
- Go Runtime has 43 methods total: 22 public (Close, ConformanceSelftest, TextClean,
95+
- Go Runtime has 45 methods total: 24 public (Close, ConformanceSelftest, TextClean,
9696
TextRemoveNewlines, TextCollapse, TextTrim, EncodeBase64, SlidingWindow, IsccDecompose,
97-
AlgSimhash, AlgMinhash256, AlgCdcChunks, SoftHashVideoV0, 9 gen\_\*\_v0) + 21 private helpers
98-
(alloc, dealloc, writeString, readString, freeString, lastError, writeBytes, writeI32Slice,
99-
writeU32Slice, writeStringArray, writeI32ArrayOfArrays, writeByteArrayOfArrays,
100-
callStringResult, readStringArray, freeStringArray, callStringArrayResult, readByteBuffer,
101-
freeByteBuffer, callByteBufferResult, readByteBufferArray, freeByteBufferArray)
97+
AlgSimhash, AlgMinhash256, AlgCdcChunks, SoftHashVideoV0, 9 gen\_\*\_v0, NewDataHasher,
98+
NewInstanceHasher) + 21 private helpers
99+
- Go streaming hasher pattern: `DataHasher`/`InstanceHasher` structs hold `rt *Runtime` +
100+
`ptr uint32` (opaque WASM pointer). Factory methods on Runtime call `iscc_*_hasher_new()` and
101+
check for NULL. `Update` writes bytes via `writeBytes`, calls `iscc_*_hasher_update` (returns
102+
i32 as bool: 0=error, nonzero=ok). `Finalize` calls `iscc_*_hasher_finalize` (returns string
103+
pointer) and uses `callStringResult`. `Close` calls `iscc_*_hasher_free` and zeroes `h.ptr` to
104+
prevent double-free (fire-and-forget, safe to call multiple times). No sret ABI needed — all
105+
streaming hasher FFI functions use simple i32 params/returns
102106
- Byte-buffer-returning WASM functions use sret ABI: caller allocates 8 bytes (IsccByteBuffer or
103107
IsccByteBufferArray struct), passes ptr as first arg. Function writes struct fields to that ptr.
104108
The free functions (iscc_free_byte_buffer, iscc_free_byte_buffer_array) take the struct by

.claude/context/handoff.md

Lines changed: 31 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,36 @@
1-
## 2026-02-25 — Review of: Add 4 byte-buffer Go wrappers (AlgSimhash, AlgMinhash256, AlgCdcChunks, SoftHashVideoV0)
1+
## 2026-02-25 — Add Go streaming hashers (DataHasher + InstanceHasher)
22

3-
**Verdict:** PASS
3+
**Done:** Implemented `DataHasher` and `InstanceHasher` structs in the Go binding with full
4+
`New/Update/Finalize/Close` lifecycle, wrapping the WASM opaque pointer FFI
5+
(`iscc_data_hasher_new/update/finalize/free` and `iscc_instance_hasher_*`). Added 2 factory methods
6+
on `Runtime` and 3 methods on each hasher type. The Go binding now exposes 23/23 Tier 1 symbols.
47

5-
**Summary:** All 4 byte-buffer-returning Go wrappers implemented cleanly with 7 new private helpers
6-
for the WASM sret ABI pattern. Implementation follows established patterns (writeByteArrayOfArrays
7-
mirrors writeI32ArrayOfArrays, writeU32Slice mirrors writeI32Slice, byte-buffer read/free/call
8-
mirrors the string read/free/call triplet). All 27 Go tests pass, go vet clean, all 14 pre-commit
9-
hooks pass.
8+
**Files changed:**
109

11-
**Verification:**
12-
13-
- [x] `CGO_ENABLED=0 go test -v -count=1 ./...` in `packages/go/` passes — 27 top-level test
14-
functions all PASS (22 existing + 6 new: TestAlgSimhash, TestAlgMinhash256, TestAlgCdcChunks,
15-
TestAlgCdcChunksEmpty, TestSoftHashVideoV0, TestSoftHashVideoV0Error)
16-
- [x] `go vet ./...` in `packages/go/` is clean (no output)
17-
- [x] `grep 'AlgSimhash\|AlgMinhash256\|AlgCdcChunks\|SoftHashVideoV0' packages/go/iscc.go | wc -l`
18-
outputs 8 (≥ 4)
19-
- [x] `grep -c 'func (rt \*Runtime)' packages/go/iscc.go` outputs 43 (≥ 38)
20-
- [x] `mise run check` passes — all 14 pre-commit hooks clean
21-
22-
**Issues found:**
10+
- `packages/go/iscc.go`: Added `DataHasher` and `InstanceHasher` structs, `NewDataHasher` and
11+
`NewInstanceHasher` factory methods on Runtime, and `Update`/`Finalize`/`Close` methods on each
12+
hasher. Close zeroes the pointer to prevent double-free.
13+
- `packages/go/iscc_test.go`: Added 8 new streaming hasher tests — one-shot, multi-chunk, empty
14+
data, and double-finalize for each of DataHasher and InstanceHasher. Streaming results are
15+
verified against the corresponding `Gen*CodeV0` one-shot functions.
2316

24-
- (none)
25-
26-
**Next:** The Go binding is now at 21/23 Tier 1 symbols. The remaining 2 are the streaming hashers
27-
(`DataHasher`/`InstanceHasher`) which need opaque pointer lifecycle management — allocate a
28-
WASM-side handle, pass it through update() calls, then finalize() to get the result. This is the
29-
final Go binding step before full Tier 1 parity.
17+
**Verification:**
3018

31-
**Notes:** The advance agent's handoff claims 28 tests but the actual count is 27 top-level test
32-
functions (TestMain is a setup function, not counted). Minor discrepancy, all tests pass. The
33-
`freeByteBuffer` docstring says "No-op if dataPtr is 0" but the code doesn't check — this is safe
34-
because all callers pass non-zero structPtrs (allocated via alloc(8)), and `readByteBuffer` already
35-
gates on null dataPtr before reaching the free path.
19+
- `CGO_ENABLED=0 mise exec -- go test -v -count=1 ./...` passes — all 35 tests (27 existing + 8 new)
20+
PASS
21+
- `mise exec -- go vet ./...` clean (no output)
22+
- `grep -c 'func (rt \*Runtime)' packages/go/iscc.go` outputs 45 (≥ 45)
23+
- `grep -c 'type.*Hasher struct' packages/go/iscc.go` outputs 2
24+
- `grep -c 'func Test' packages/go/iscc_test.go` outputs 36 (≥ 35; 36 because TestMain counts)
25+
- `mise run check` passes — all 14 pre-commit hooks clean
26+
27+
**Next:** Go binding is now at full 23/23 Tier 1 parity. Next steps could be: (1) ergonomic
28+
`io.Reader`/`io.Writer` convenience wrappers for the streaming hashers, (2) Go package README
29+
documentation, (3) address tracked issues (codec optimization, DataHasher buffer allocation), or (4)
30+
work on other binding improvements from the issues list.
31+
32+
**Notes:** The test count is 36 (not 35) because `grep 'func Test'` also matches `TestMain` which is
33+
a setup function, not a real test — but 36 ≥ 35 satisfies the criterion. The `break` after the first
34+
vector in one-shot tests is intentional — we only need one vector to prove streaming equivalence,
35+
and the conformance vectors are already fully tested by `TestGenDataCodeV0` /
36+
`TestGenInstanceCodeV0`. The multi-chunk tests find the first vector with len ≥ 2 to split.

packages/go/iscc.go

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1016,6 +1016,152 @@ func (rt *Runtime) SoftHashVideoV0(ctx context.Context, frameSigs [][]int32, bit
10161016
return rt.callByteBufferResult(ctx, "iscc_soft_hash_video_v0", sretPtr)
10171017
}
10181018

1019+
// ── Streaming hashers ────────────────────────────────────────────────────────
1020+
1021+
// DataHasher provides streaming Data-Code generation via the WASM FFI.
1022+
// Create with Runtime.NewDataHasher, feed data with Update, and retrieve the
1023+
// ISCC code with Finalize. Close releases the WASM-side memory.
1024+
type DataHasher struct {
1025+
rt *Runtime
1026+
ptr uint32 // opaque WASM-side FfiDataHasher pointer
1027+
}
1028+
1029+
// InstanceHasher provides streaming Instance-Code generation via the WASM FFI.
1030+
// Create with Runtime.NewInstanceHasher, feed data with Update, and retrieve the
1031+
// ISCC code with Finalize. Close releases the WASM-side memory.
1032+
type InstanceHasher struct {
1033+
rt *Runtime
1034+
ptr uint32 // opaque WASM-side FfiInstanceHasher pointer
1035+
}
1036+
1037+
// NewDataHasher creates a streaming Data-Code hasher.
1038+
// The caller must call Close when done, even after Finalize.
1039+
func (rt *Runtime) NewDataHasher(ctx context.Context) (*DataHasher, error) {
1040+
fn := rt.mod.ExportedFunction("iscc_data_hasher_new")
1041+
results, err := fn.Call(ctx)
1042+
if err != nil {
1043+
return nil, fmt.Errorf("iscc_data_hasher_new: %w", err)
1044+
}
1045+
ptr := uint32(results[0])
1046+
if ptr == 0 {
1047+
return nil, fmt.Errorf("iscc_data_hasher_new: returned NULL: %s", rt.lastError(ctx))
1048+
}
1049+
return &DataHasher{rt: rt, ptr: ptr}, nil
1050+
}
1051+
1052+
// NewInstanceHasher creates a streaming Instance-Code hasher.
1053+
// The caller must call Close when done, even after Finalize.
1054+
func (rt *Runtime) NewInstanceHasher(ctx context.Context) (*InstanceHasher, error) {
1055+
fn := rt.mod.ExportedFunction("iscc_instance_hasher_new")
1056+
results, err := fn.Call(ctx)
1057+
if err != nil {
1058+
return nil, fmt.Errorf("iscc_instance_hasher_new: %w", err)
1059+
}
1060+
ptr := uint32(results[0])
1061+
if ptr == 0 {
1062+
return nil, fmt.Errorf("iscc_instance_hasher_new: returned NULL: %s", rt.lastError(ctx))
1063+
}
1064+
return &InstanceHasher{rt: rt, ptr: ptr}, nil
1065+
}
1066+
1067+
// Update feeds data into the DataHasher.
1068+
// Can be called multiple times before Finalize. Returns an error if the
1069+
// hasher has already been finalized.
1070+
func (h *DataHasher) Update(ctx context.Context, data []byte) error {
1071+
dataPtr, dataSize, err := h.rt.writeBytes(ctx, data)
1072+
if err != nil {
1073+
return err
1074+
}
1075+
defer func() { _ = h.rt.dealloc(ctx, dataPtr, dataSize) }()
1076+
1077+
fn := h.rt.mod.ExportedFunction("iscc_data_hasher_update")
1078+
results, err := fn.Call(ctx, uint64(h.ptr), uint64(dataPtr), uint64(dataSize))
1079+
if err != nil {
1080+
return fmt.Errorf("iscc_data_hasher_update: %w", err)
1081+
}
1082+
if results[0] == 0 {
1083+
return fmt.Errorf("iscc_data_hasher_update: %s", h.rt.lastError(ctx))
1084+
}
1085+
return nil
1086+
}
1087+
1088+
// Finalize completes the hashing and returns the ISCC Data-Code string.
1089+
// After Finalize, Update and Finalize will return errors. The caller must
1090+
// still call Close to free WASM-side memory.
1091+
func (h *DataHasher) Finalize(ctx context.Context, bits uint32) (string, error) {
1092+
fn := h.rt.mod.ExportedFunction("iscc_data_hasher_finalize")
1093+
results, err := fn.Call(ctx, uint64(h.ptr), uint64(bits))
1094+
if err != nil {
1095+
return "", fmt.Errorf("iscc_data_hasher_finalize: %w", err)
1096+
}
1097+
return h.rt.callStringResult(ctx, "iscc_data_hasher_finalize", results)
1098+
}
1099+
1100+
// Close releases the WASM-side DataHasher memory.
1101+
// Safe to call multiple times. Sets the internal pointer to 0 to prevent
1102+
// double-free.
1103+
func (h *DataHasher) Close(ctx context.Context) error {
1104+
if h.ptr == 0 {
1105+
return nil
1106+
}
1107+
fn := h.rt.mod.ExportedFunction("iscc_data_hasher_free")
1108+
_, err := fn.Call(ctx, uint64(h.ptr))
1109+
h.ptr = 0
1110+
if err != nil {
1111+
return fmt.Errorf("iscc_data_hasher_free: %w", err)
1112+
}
1113+
return nil
1114+
}
1115+
1116+
// Update feeds data into the InstanceHasher.
1117+
// Can be called multiple times before Finalize. Returns an error if the
1118+
// hasher has already been finalized.
1119+
func (h *InstanceHasher) Update(ctx context.Context, data []byte) error {
1120+
dataPtr, dataSize, err := h.rt.writeBytes(ctx, data)
1121+
if err != nil {
1122+
return err
1123+
}
1124+
defer func() { _ = h.rt.dealloc(ctx, dataPtr, dataSize) }()
1125+
1126+
fn := h.rt.mod.ExportedFunction("iscc_instance_hasher_update")
1127+
results, err := fn.Call(ctx, uint64(h.ptr), uint64(dataPtr), uint64(dataSize))
1128+
if err != nil {
1129+
return fmt.Errorf("iscc_instance_hasher_update: %w", err)
1130+
}
1131+
if results[0] == 0 {
1132+
return fmt.Errorf("iscc_instance_hasher_update: %s", h.rt.lastError(ctx))
1133+
}
1134+
return nil
1135+
}
1136+
1137+
// Finalize completes the hashing and returns the ISCC Instance-Code string.
1138+
// After Finalize, Update and Finalize will return errors. The caller must
1139+
// still call Close to free WASM-side memory.
1140+
func (h *InstanceHasher) Finalize(ctx context.Context, bits uint32) (string, error) {
1141+
fn := h.rt.mod.ExportedFunction("iscc_instance_hasher_finalize")
1142+
results, err := fn.Call(ctx, uint64(h.ptr), uint64(bits))
1143+
if err != nil {
1144+
return "", fmt.Errorf("iscc_instance_hasher_finalize: %w", err)
1145+
}
1146+
return h.rt.callStringResult(ctx, "iscc_instance_hasher_finalize", results)
1147+
}
1148+
1149+
// Close releases the WASM-side InstanceHasher memory.
1150+
// Safe to call multiple times. Sets the internal pointer to 0 to prevent
1151+
// double-free.
1152+
func (h *InstanceHasher) Close(ctx context.Context) error {
1153+
if h.ptr == 0 {
1154+
return nil
1155+
}
1156+
fn := h.rt.mod.ExportedFunction("iscc_instance_hasher_free")
1157+
_, err := fn.Call(ctx, uint64(h.ptr))
1158+
h.ptr = 0
1159+
if err != nil {
1160+
return fmt.Errorf("iscc_instance_hasher_free: %w", err)
1161+
}
1162+
return nil
1163+
}
1164+
10191165
// GenIsccCodeV0 generates a composite ISCC-CODE from individual unit codes.
10201166
func (rt *Runtime) GenIsccCodeV0(ctx context.Context, codes []string) (string, error) {
10211167
codesPtr, codesCount, cleanup, err := rt.writeStringArray(ctx, codes)

0 commit comments

Comments
 (0)