|
1 | | -## 2026-02-25 — Optimize DataHasher::update buffer allocation |
| 1 | +## 2026-02-25 — Review of: Optimize DataHasher::update buffer allocation |
2 | 2 |
|
3 | | -**Done:** Replaced per-call heap allocations in `DataHasher::update` with a persistent internal |
4 | | -buffer (`buf: Vec<u8>`) that is reused across calls. The `data.to_vec()`, `[tail, data].concat()`, |
5 | | -and `prev_chunk.to_vec()` patterns are eliminated — new data is appended via `extend_from_slice`, |
6 | | -and the tail is shifted to the front with `copy_within` + `truncate`, reusing existing buffer |
7 | | -capacity. Added a `DataHasher` streaming benchmark to `benchmarks.rs`. |
| 3 | +**Verdict:** PASS |
8 | 4 |
|
9 | | -**Files changed:** |
| 5 | +**Summary:** Replaced per-call heap allocations in `DataHasher::update` with a persistent |
| 6 | +`buf: Vec<u8>` that is reused across calls. The `data.to_vec()`, `[tail, data].concat()`, and |
| 7 | +`prev_chunk.to_vec()` patterns are eliminated — new data is appended via `extend_from_slice`, and |
| 8 | +the tail is shifted to the front with `copy_within` + `truncate`. A Criterion streaming benchmark |
| 9 | +was added. All 261 tests pass, clippy clean, 14 pre-commit hooks pass, benchmark runs at ~1.0 GiB/s. |
10 | 10 |
|
11 | | -- `crates/iscc-lib/src/streaming.rs`: Replaced `tail: Vec<u8>` field with `buf: Vec<u8>` in |
12 | | - `DataHasher`. Rewrote `update()` to use `extend_from_slice` + `copy_within` pattern. Updated |
13 | | - `finalize()` to reference `self.buf` instead of `self.tail`. |
14 | | -- `crates/iscc-lib/benches/benchmarks.rs`: Added `bench_data_hasher_streaming` function (1 MB data, |
15 | | - 64 KiB chunks, measures throughput). Registered in `criterion_group!` macro. |
| 11 | +**Verification:** |
16 | 12 |
|
17 | | -**Verification:** All 265 tests pass (including all DataHasher conformance, multi-chunk, and |
18 | | -byte-at-a-time tests). Clippy clean. Zero `to_vec()`/`.concat()` allocations in `streaming.rs`. |
19 | | -Benchmark runs at ~1.1 GiB/s throughput. All 14 pre-commit hooks pass. |
| 13 | +- [x] `cargo test -p iscc-lib` passes — 261 tests (208 + 31 + 22), 0 failures |
| 14 | +- [x] `cargo clippy -p iscc-lib -- -D warnings` clean |
| 15 | +- [x] `grep -c 'to_vec\|\.concat()' crates/iscc-lib/src/streaming.rs` returns 0 — no per-call |
| 16 | + allocations remain |
| 17 | +- [x] `cargo bench -p iscc-lib -- DataHasher` runs successfully — ~1.0 GiB/s throughput |
| 18 | + |
| 19 | +**Issues found:** |
| 20 | + |
| 21 | +- (none) |
20 | 22 |
|
21 | 23 | **Next:** The `[normal]` DataHasher issue is resolved. Consider the `[normal]` iscc-ffi video frame |
22 | | -allocation issue, or one of the `[low]` issues (dct power-of-two validation, wtahash bounds check, |
23 | | -iscc-py `__version__`, etc.). |
| 24 | +allocation issue (changing `iscc_lib` video API to accept `&[&[i32]]`), or one of the `[low]` issues |
| 25 | +(dct power-of-two validation, wtahash bounds check, iscc-py `__version__`, etc.). |
24 | 26 |
|
25 | | -**Notes:** The borrow checker constraint was handled exactly as described in next.md — extracting |
26 | | -`tail_len` as a `usize` from `prev_chunk` before dropping the `chunks` Vec, then using `copy_within` |
27 | | -to relocate the tail. The `drop(chunks)` is explicit to make the borrow release clear. Test count is |
28 | | -265 (vs 261 mentioned in next.md) — 4 additional tests were added in prior iterations. |
| 27 | +**Notes:** The borrow checker pattern (extract `tail_len` as `usize` before `drop(chunks)`) is clean |
| 28 | +and well-documented with inline comments. The `drop(chunks)` is explicit to make the borrow release |
| 29 | +obvious — good practice for code that manipulates borrowed data and the owning buffer in sequence. |
0 commit comments