Skip to content

Commit 1924063

Browse files
author
CID Agent
committed
cid(advance): Add Codec operations and Constants sections to 4 binding howto guides
1 parent 1d3ec43 commit 1924063

File tree

6 files changed

+362
-51
lines changed

6 files changed

+362
-51
lines changed

.claude/agent-memory/advance/MEMORY.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,8 +228,12 @@ iterations.
228228
- zensical.toml nav: How-to Guides order is Rust → Python → Node.js → WebAssembly → Go → Java
229229
- Go and Java guides include algorithm primitives section (SlidingWindow, AlgMinhash256,
230230
AlgCdcChunks, AlgSimhash) not present in Python/Node.js guides
231-
- All 6 how-to guides complete: Rust (356 lines), Python (353), Node.js (281), WASM (338), Go (388),
232-
Java (321)
231+
- All 6 how-to guides have Codec operations + Constants sections. Python, Node.js, WASM, Go all have
232+
them; Java has them too. Python uniquely documents `core_opts` SimpleNamespace and IntEnum
233+
return types from `iscc_decode`. WASM constants use getter functions (`meta_trim_name()` not
234+
`META_TRIM_NAME`). Node.js Codec section uses `require()` style imports per next.md spec
235+
- All 6 how-to guides complete: Rust (356 lines), Python (~420), Node.js (~350), WASM (~410), Go
236+
(463), Java (~390)
233237
- `docs/architecture.md` and `docs/development.md` include all 6 binding crates (Python, Node.js,
234238
WASM, C FFI, JNI, Go) in diagrams, layout trees, and tables. Go uses dotted arrow (`-.->`) in
235239
Mermaid to indicate indirect WASM dependency via `iscc-ffi`

.claude/context/handoff.md

Lines changed: 36 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,36 @@
1-
## 2026-02-27 — Review of: Update Go docs for 30/30 Tier 1 symbols
2-
3-
**Verdict:** PASS_WITH_NOTES
4-
5-
**Summary:** Clean, well-structured documentation update adding "Codec operations" and "Constants"
6-
sections to `docs/howto/go.md` and replacing stale "planned" text in `packages/go/README.md` with
7-
organized API tables covering all 30/30 Tier 1 symbols. One minor doc bug fixed during review
8-
(incorrect prefix claim in decompose example).
9-
10-
**Verification:**
11-
12-
- [x] `grep -c 'EncodeComponent' docs/howto/go.md` → 1
13-
- [x] `grep -c 'IsccDecode' docs/howto/go.md` → 2
14-
- [x] `grep -c 'IsccDecompose' docs/howto/go.md` → 1
15-
- [x] `grep -c 'JsonToDataUrl' docs/howto/go.md` → 1
16-
- [x] `grep -c 'EncodeBase64' docs/howto/go.md` → 1
17-
- [x] `grep -c 'SoftHashVideoV0' docs/howto/go.md` → 1
18-
- [x] `grep -c 'MetaTrimName' docs/howto/go.md` → 1
19-
- [x] `grep 'are planned' packages/go/README.md` → no matches (stale text removed)
20-
- [x] `grep -c 'EncodeComponent' packages/go/README.md` → 1
21-
- [x] `grep -c 'DataHasher' packages/go/README.md` → 2
22-
- [x] `uv run zensical build` → exits 0
23-
- [x] `mise run check` → all 14 hooks pass
24-
25-
**Issues found:**
26-
27-
- Minor: Decompose example comment said "Each unit code with 'ISCC:' prefix" but `iscc_decompose`
28-
returns units WITHOUT the prefix (verified via Rust docstring and Go test). Fixed in review.
29-
30-
**Codex review:** Codex identified the incorrect "ISCC:" prefix claim in the decompose example
31-
comment — confirmed and fixed. No other actionable findings.
32-
33-
**Next:** All 6 language bindings have 30/30 Tier 1 symbols documented. The project can move on to
34-
other target milestones — likely the remaining issues in issues.md: dict auto-serialization for
35-
`gen_meta_code_v0` (issue #5), or adding codec/constants documentation sections to Java and Python
36-
howto guides for cross-language parity.
37-
38-
**Notes:** The `SoftHashVideoV0` function is documented under "Codec operations" rather than
39-
"Algorithm primitives" — this follows the next.md spec but is a slight taxonomic mismatch. Not worth
40-
changing now; if howto guides get a structural refresh later, it could be moved. Java and Python
41-
howto guides don't yet have codec/constants sections — Go is the first binding with full doc
42-
coverage.
1+
## 2026-02-27 — Add Codec operations and Constants sections to binding howto guides
2+
3+
**Done:** Added "Codec operations" and "Constants" documentation sections to all 4 binding howto
4+
guides (Python, Node.js, Java, WASM), achieving cross-language documentation parity with the Go
5+
guide. Each guide covers encode/decode, decompose, additional codec functions, and algorithm
6+
constants with language-idiomatic code examples.
7+
8+
**Files changed:**
9+
10+
- `docs/howto/python.md`: Added Codec operations section (encode/decode with IntEnum types,
11+
decompose, 3 bullet-point functions), Constants section (direct imports + core_opts namespace),
12+
updated front matter description and opening paragraph
13+
- `docs/howto/nodejs.md`: Added Codec operations section (encode/decode with IsccDecodeResult,
14+
decompose, 3 bullet-point functions using require() style), Constants section (destructured
15+
exports), updated front matter description and opening paragraph
16+
- `docs/howto/java.md`: Added Codec operations section (camelCase methods on IsccLib,
17+
IsccDecodeResult with int/byte[] fields, decompose, 3 bullet-point methods), Constants section
18+
(static final int on IsccLib), updated front matter description
19+
- `docs/howto/wasm.md`: Added Codec operations section (encode/decode with Uint8Array,
20+
IsccDecodeResult, decompose, 3 bullet-point functions), Constants section (getter functions),
21+
updated front matter description and opening paragraph
22+
23+
**Verification:** All 14 grep checks pass (all ≥ 1). `uv run zensical build` exits 0.
24+
`mise run check` passes all 14 hooks. mdformat auto-formatted some alignment in code comments
25+
(removed extra alignment spaces in constants sections) — changes are clean.
26+
27+
**Next:** All 6 language binding howto guides now have full documentation coverage (30/30 Tier 1
28+
symbols). The project can focus on other target milestones — the remaining issues in issues.md (dict
29+
auto-serialization for gen_meta_code_v0, or adding Algorithm primitives sections to Python, Node.js,
30+
and WASM guides).
31+
32+
**Notes:** The Python guide uniquely documents the `core_opts` SimpleNamespace and IntEnum return
33+
types from `iscc_decode` — these are Python-specific API features not present in other bindings. The
34+
Node.js guide uses `require()` import style in the Codec sections as specified in next.md, while the
35+
existing Code generation sections use ESM `import` — this mirrors the Go guide's pattern of showing
36+
different import styles in different sections.

docs/howto/java.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
icon: lucide/coffee
3-
description: Guide to using iscc-lib from Java — JNI setup, code generation, streaming, and utilities.
3+
description: Guide to using iscc-lib from Java — JNI setup, code generation, streaming, codec operations, constants, and utilities.
44
---
55

66
# Java
@@ -296,6 +296,71 @@ Additional primitives:
296296
- `IsccLib.softHashVideoV0(int[][] frameSigs, int bits)` — compute a similarity-preserving hash from
297297
video frame signatures, returns `byte[]`
298298

299+
## Codec operations
300+
301+
Methods for encoding, decoding, and decomposing ISCC codes. These operate on the ISCC binary format
302+
defined in ISO 24138.
303+
304+
### Encode and decode
305+
306+
Construct an ISCC unit from raw header fields and digest, then decode it back:
307+
308+
```java
309+
// Encode: maintype=0 (Meta), subtype=0, version=0, 64 bits, 8-byte digest
310+
byte[] digest = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
311+
String code = IsccLib.encodeComponent(0, 0, 0, 64, digest);
312+
System.out.println(code); // ISCC unit string (without "ISCC:" prefix)
313+
314+
// Decode: parse an ISCC unit string back into header components and digest
315+
IsccDecodeResult result = IsccLib.isccDecode(code);
316+
System.out.printf("Maintype: %d, Subtype: %d, Version: %d, Length: %d%n",
317+
result.maintype, result.subtype, result.version, result.length);
318+
System.out.printf("Digest: %s%n", java.util.HexFormat.of().formatHex(result.digest));
319+
```
320+
321+
`isccDecode` returns an `IsccDecodeResult` with `int` fields `maintype`, `subtype`, `version`,
322+
`length` (length index), and a `byte[]` field `digest`.
323+
324+
### Decompose
325+
326+
Split a composite ISCC-CODE into its individual unit codes:
327+
328+
```java
329+
byte[] data = "Hello World".repeat(1000).getBytes();
330+
String dataCode = IsccLib.genDataCodeV0(data, 64);
331+
String instanceCode = IsccLib.genInstanceCodeV0(data, 64);
332+
String isccCode = IsccLib.genIsccCodeV0(
333+
new String[]{dataCode, instanceCode}, false
334+
);
335+
336+
// Decompose into individual units
337+
String[] units = IsccLib.isccDecompose(isccCode);
338+
for (String unit : units) {
339+
System.out.println(unit); // Each unit code (without "ISCC:" prefix)
340+
}
341+
```
342+
343+
### Other codec methods
344+
345+
- `IsccLib.encodeBase64(byte[] data)` — encode bytes to base64 string
346+
- `IsccLib.jsonToDataUrl(String json)` — convert a JSON string to a
347+
`data:application/json;base64,...` URL
348+
- `IsccLib.softHashVideoV0(int[][] frameSigs, int bits)` — compute a video similarity hash from
349+
MPEG-7 frame signatures, returns `byte[]`
350+
351+
## Constants
352+
353+
Static constants on the `IsccLib` class used by the ISCC algorithms:
354+
355+
```java
356+
import io.iscc.iscc_lib.IsccLib;
357+
358+
IsccLib.META_TRIM_NAME; // 128 — max byte length for name normalization
359+
IsccLib.META_TRIM_DESCRIPTION; // 4096 — max byte length for description normalization
360+
IsccLib.IO_READ_SIZE; // 4_194_304 — default read buffer size (4 MB)
361+
IsccLib.TEXT_NGRAM_SIZE; // 13 — n-gram size for text similarity hashing
362+
```
363+
299364
## Conformance testing
300365

301366
Verify that the library produces correct results for all official test vectors:

docs/howto/nodejs.md

Lines changed: 81 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
icon: lucide/hexagon
3-
description: Guide to using iscc-lib from Node.js via the native addon.
3+
description: Guide to using iscc-lib from Node.js — code generation, streaming, codec operations, and constants.
44
---
55

66
# Node.js
77

88
A guide to using iscc-lib from Node.js via the `@iscc/lib` native addon. Covers installation, code
9-
generation, and streaming.
9+
generation, streaming, codec operations, and constants.
1010

1111
---
1212

@@ -252,6 +252,85 @@ const singleLine = text_remove_newlines("Hello\nWorld");
252252
const trimmed = text_trim("Hello World", 5);
253253
```
254254

255+
## Codec operations
256+
257+
Functions for encoding, decoding, and decomposing ISCC codes. These operate on the ISCC binary
258+
format defined in ISO 24138.
259+
260+
### Encode and decode
261+
262+
Construct an ISCC unit from raw header fields and digest, then decode it back:
263+
264+
```javascript
265+
const {
266+
encode_component,
267+
iscc_decode
268+
} = require("@iscc/lib");
269+
270+
// Encode: maintype=0 (Meta), subtype=0, version=0, 64 bits, 8-byte digest
271+
const digest = Buffer.from([0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08]);
272+
const code = encode_component(0, 0, 0, 64, digest);
273+
console.log(code); // ISCC unit string (without "ISCC:" prefix)
274+
275+
// Decode: parse an ISCC unit string back into header components and digest
276+
const result = iscc_decode(code);
277+
console.log(`Maintype: ${result.maintype}, Subtype: ${result.subtype}`);
278+
console.log(`Version: ${result.version}, Length: ${result.length}`);
279+
console.log(`Digest: ${Buffer.from(result.digest).toString("hex")}`);
280+
```
281+
282+
`iscc_decode` returns an `IsccDecodeResult` object with `maintype`, `subtype`, `version`, `length`
283+
(length index), and `digest` (Buffer) fields.
284+
285+
### Decompose
286+
287+
Split a composite ISCC-CODE into its individual unit codes:
288+
289+
```javascript
290+
const {
291+
gen_data_code_v0,
292+
gen_instance_code_v0,
293+
gen_iscc_code_v0,
294+
iscc_decompose
295+
} = require("@iscc/lib");
296+
297+
const data = Buffer.from("Hello World".repeat(1000));
298+
const dataCode = gen_data_code_v0(data);
299+
const instanceCode = gen_instance_code_v0(data);
300+
const isccCode = gen_iscc_code_v0([dataCode, instanceCode]);
301+
302+
// Decompose into individual units
303+
const units = iscc_decompose(isccCode);
304+
for (const unit of units) {
305+
console.log(unit); // Each unit code (without "ISCC:" prefix)
306+
}
307+
```
308+
309+
### Other codec functions
310+
311+
- `encode_base64(data)` — encode a Buffer to base64 string
312+
- `json_to_data_url(json)` — convert a JSON string to a `data:application/json;base64,...` URL
313+
- `soft_hash_video_v0(frameSigs, bits?)` — compute a video similarity hash from MPEG-7 frame
314+
signatures, returns Buffer
315+
316+
## Constants
317+
318+
Exported constants used by the ISCC algorithms:
319+
320+
```javascript
321+
const {
322+
META_TRIM_NAME,
323+
META_TRIM_DESCRIPTION,
324+
IO_READ_SIZE,
325+
TEXT_NGRAM_SIZE,
326+
} = require("@iscc/lib");
327+
328+
META_TRIM_NAME; // 128 — max byte length for name normalization
329+
META_TRIM_DESCRIPTION; // 4096 — max byte length for description normalization
330+
IO_READ_SIZE; // 4_194_304 — default read buffer size (4 MB)
331+
TEXT_NGRAM_SIZE; // 13 — n-gram size for text similarity hashing
332+
```
333+
255334
## Conformance testing
256335

257336
Verify the library against official test vectors:

docs/howto/python.md

Lines changed: 90 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
icon: lucide/terminal
3-
description: Guide to using iscc-lib from Python — code generation, streaming, and text utilities.
3+
description: Guide to using iscc-lib from Python — code generation, streaming, codec operations, constants, and text utilities.
44
---
55

66
# Python
77

88
A guide to using iscc-lib from Python. Covers installation, code generation, structured results,
9-
streaming, and text utilities.
9+
streaming, text utilities, codec operations, and constants.
1010

1111
---
1212

@@ -329,6 +329,94 @@ trimmed = text_trim("Hello World", 5)
329329
print(trimmed) # 'Hello'
330330
```
331331

332+
## Codec operations
333+
334+
Functions for encoding, decoding, and decomposing ISCC codes. These operate on the ISCC binary
335+
format defined in ISO 24138.
336+
337+
### Encode and decode
338+
339+
Construct an ISCC unit from raw header fields and digest, then decode it back:
340+
341+
```python
342+
from iscc_lib import encode_component, iscc_decode, MT, ST, VS
343+
344+
# Encode: maintype=0 (Meta), subtype=0, version=0, 64 bits, 8-byte digest
345+
digest = bytes([0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08])
346+
code = encode_component(0, 0, 0, 64, digest)
347+
print(code) # ISCC unit string (without "ISCC:" prefix)
348+
349+
# Decode: parse an ISCC unit string back into header components and digest
350+
mt, st, vs, length, raw_digest = iscc_decode(code)
351+
print(f"MainType: {mt}, SubType: {st}, Version: {vs}, Length: {length}")
352+
print(f"Digest: {raw_digest.hex()}")
353+
354+
# Returned enum fields are IntEnum instances
355+
assert isinstance(mt, MT)
356+
assert isinstance(st, ST)
357+
assert isinstance(vs, VS)
358+
```
359+
360+
`iscc_decode` returns a `tuple[MT, ST, VS, int, bytes]` with `IntEnum`-typed values for the header
361+
fields.
362+
363+
### Decompose
364+
365+
Split a composite ISCC-CODE into its individual unit codes:
366+
367+
```python
368+
from iscc_lib import (
369+
gen_data_code_v0,
370+
gen_instance_code_v0,
371+
gen_iscc_code_v0,
372+
iscc_decompose,
373+
)
374+
375+
data = b"Hello World" * 1000
376+
data_result = gen_data_code_v0(data)
377+
instance_result = gen_instance_code_v0(data)
378+
iscc_code = gen_iscc_code_v0([data_result.iscc, instance_result.iscc])
379+
380+
# Decompose into individual units
381+
units = iscc_decompose(iscc_code.iscc)
382+
for unit in units:
383+
print(unit) # Each unit code (without "ISCC:" prefix)
384+
```
385+
386+
### Other codec functions
387+
388+
- `encode_base64(data: bytes) -> str` — encode bytes to base64
389+
- `json_to_data_url(json: str) -> str` — convert a JSON string to a
390+
`data:application/json;base64,...` URL
391+
- `soft_hash_video_v0(frame_sigs, bits=64) -> bytes` — compute a video similarity hash from MPEG-7
392+
frame signatures
393+
394+
## Constants
395+
396+
Module-level constants used by the ISCC algorithms. These are available as direct imports and also
397+
through the `core_opts` namespace for iscc-core API parity:
398+
399+
```python
400+
from iscc_lib import (
401+
META_TRIM_NAME,
402+
META_TRIM_DESCRIPTION,
403+
IO_READ_SIZE,
404+
TEXT_NGRAM_SIZE,
405+
core_opts,
406+
)
407+
408+
META_TRIM_NAME # 128 — max byte length for name normalization
409+
META_TRIM_DESCRIPTION # 4096 — max byte length for description normalization
410+
IO_READ_SIZE # 4_194_304 — default read buffer size (4 MB)
411+
TEXT_NGRAM_SIZE # 13 — n-gram size for text similarity hashing
412+
413+
# core_opts namespace (iscc-core compatibility)
414+
core_opts.meta_trim_name # 128
415+
core_opts.meta_trim_description # 4096
416+
core_opts.io_read_size # 4_194_304
417+
core_opts.text_ngram_size # 13
418+
```
419+
332420
## Conformance testing
333421

334422
Verify that the library produces correct results for all official test vectors:

0 commit comments

Comments
 (0)