|
| 1 | +# HashRust Code Review |
| 2 | + |
| 3 | +**Date:** 2026-02-28 |
| 4 | +**Reviewer:** Claude Code (claude-sonnet-4-6) |
| 5 | +**Commit:** f8e71ad (main) |
| 6 | +**Scope:** Full codebase review — current state vs. prior review (2026-02-09) |
| 7 | +**Lines reviewed:** ~1,100 Rust |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Prior Review Status |
| 12 | + |
| 13 | +The February 9 Opus review identified 22 findings. The current codebase has addressed the majority: |
| 14 | + |
| 15 | +| Prior Finding | Status | |
| 16 | +|---|---| |
| 17 | +| C1: Silent partial hash on I/O error | **Fixed** — loop now uses `?` | |
| 18 | +| C2: TOCTOU race in file_size | **Fixed** — `file_size()` calls `metadata()` directly | |
| 19 | +| H1: Interleaved stdout in parallel mode | **Fixed** — results collected then printed serially | |
| 20 | +| H2: Loose dependency versions | **Fixed** — standard semver constraints | |
| 21 | +| M1: Duplicate CRC32/U32 validation | **Fixed** — validation lives only in CLI layer | |
| 22 | +| M2: `OutputEncoding::Unspecified` | **Fixed** — removed; `Option<OutputEncoding>` used during parse | |
| 23 | +| M3: Boolean parameter explosion | **Fixed** — struct literal construction, no constructor | |
| 24 | +| M4: Integration test temp file races | **Fixed** — `tempfile` crate used throughout | |
| 25 | +| M5: No unit tests for core hashing | **Fixed** — `hash_tests` module covers all algorithms | |
| 26 | +| M6: No multi-threaded path coverage | **Fixed** — `test_multi_file_parallel_hashing` added | |
| 27 | +| L1: Heap buffer per file | **Fixed** — `[u8; BUFFER_SIZE]` stack allocation | |
| 28 | +| L2: Unnecessary `.to_string()` | **Fixed** | |
| 29 | +| L3: `hash_file_whole` verbosity | **Fixed** — uses `D::digest(&data)` | |
| 30 | +| L4: `Crc32::from_state` dead code | **Fixed** — removed | |
| 31 | +| L5: Unused `_debug_mode` params | **Fixed** — removed | |
| 32 | +| L6: `readonly` setter bypass | **Fixed** — `readonly` removed entirely | |
| 33 | +| L7: Redundant `file_exists` | **Fixed** — removed | |
| 34 | +| I1: Clippy pedantic violations | **Fixed** | |
| 35 | +| I3: Heavyweight progress system | **Fixed** — uses `indicatif::MultiProgress` natively | |
| 36 | +| I4: `BasicHash` encapsulation | **Fixed** — inner field is private, public API surface unchanged | |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## Current Findings |
| 41 | + |
| 42 | +### High |
| 43 | + |
| 44 | +#### H1: Per-file hashing errors do not fail the process |
| 45 | +**Files:** `src/core/worker.rs:81`, `src/core/worker.rs:140` |
| 46 | + |
| 47 | +```rust |
| 48 | +Err(e) => eprintln!("File error for '{pathstr}': {e}"), |
| 49 | +``` |
| 50 | + |
| 51 | +Both `file_hashes_st` and `file_hashes_mt` swallow per-file errors: the error is printed to stderr but the functions return `Ok(())`. `worker_func` propagates that `Ok`, so the process exits with code 0. |
| 52 | + |
| 53 | +This is a correctness defect for any scripted consumer. If a caller pipes a list of files and one becomes unreadable mid-run, the process silently succeeds. Standard hashing tools (`sha256sum`, `b2sum`) exit non-zero when any file fails. |
| 54 | + |
| 55 | +**Fix:** Track whether any error occurred (e.g., `let mut had_error = false;`) and return `Err` if set, or use an explicit non-zero exit code via `std::process::exit`. |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +#### H2: Help text printed to stdout on error path |
| 60 | +**File:** `src/main.rs:20` |
| 61 | + |
| 62 | +```rust |
| 63 | +if let Err(e) = result { |
| 64 | + show_help(true); // uses println! → stdout |
| 65 | + println!(); |
| 66 | + return Err(e); // anyhow prints this to stderr |
| 67 | +} |
| 68 | +``` |
| 69 | + |
| 70 | +When the CLI returns an error (invalid algorithm, file not found, etc.), help text goes to stdout while the error itself goes to stderr. This mixes streams, breaking piped usage (`./hash_rust bad-arg 2>/dev/null | wc -l` would receive spurious help text) and violating POSIX convention. |
| 71 | + |
| 72 | +**Fix:** Replace `println!` in `show_help` with `eprintln!` when called from the error path, or add a `to_stderr: bool` parameter. |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +### Medium |
| 77 | + |
| 78 | +#### M1: Non-UTF-8 paths from stdin cause a hard failure |
| 79 | +**File:** `src/io/files.rs:24` |
| 80 | + |
| 81 | +```rust |
| 82 | +let lines = stdin.lock().lines().collect::<Result<Vec<String>, _>>()?; |
| 83 | +``` |
| 84 | + |
| 85 | +`BufRead::lines()` returns `Err` for any line containing non-UTF-8 bytes. Because the `?` here is applied to the outer `collect`, a single non-UTF-8 filename causes the *entire* stdin read to fail, discarding all subsequent paths. |
| 86 | + |
| 87 | +On Linux, filenames are arbitrary byte sequences. A directory with even one file whose name is not valid UTF-8 will abort stdin mode entirely, even for all the valid paths that preceded it. |
| 88 | + |
| 89 | +**Fix:** Use `stdin.lock().split(b'\n')` which yields raw `Vec<u8>`, then convert with `String::from_utf8_lossy`, or `OsString::from_vec` (Unix-only). At minimum, log-and-skip bad lines rather than aborting. |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +#### M2: Dead `Write` impl on `Crc32` |
| 94 | +**File:** `src/hash/crc32.rs:40-51` |
| 95 | + |
| 96 | +```rust |
| 97 | +impl Write for Crc32 { |
| 98 | + fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> { ... } |
| 99 | + fn flush(&mut self) -> std::io::Result<()> { Ok(()) } |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +`Crc32` is used exclusively through the `Digest` trait (`Update` + `FixedOutput`). The `Write` impl is never called anywhere in the codebase. It adds noise, imports `std::io::Write` into scope with a `pub use`, and could mislead a future maintainer into thinking `Crc32` has `Write`-based consumers. |
| 104 | + |
| 105 | +**Fix:** Remove the `impl Write for Crc32` block and the `use std::io::Write` import. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +#### M3: Integration test path-with-spaces fragility |
| 110 | +**File:** `tests/integration_tests.rs:301` |
| 111 | + |
| 112 | +```rust |
| 113 | +assert!( |
| 114 | + line.split_whitespace().count() == 2, |
| 115 | + "Line {i} should have exactly 2 space-separated parts (hash and path)" |
| 116 | +); |
| 117 | +``` |
| 118 | + |
| 119 | +The output format is `<hash> <path>`, but this assertion breaks the moment any file path contains whitespace — which is common on macOS where the system temp dir can be `/private/var/folders/...` (though in practice this specific path doesn't contain spaces, user-created temp dirs can). The test will also fail for any future encoding that produces output containing spaces (e.g., Base64 with padding considerations, though that is not currently the case). |
| 120 | + |
| 121 | +**Fix:** Split on the first whitespace only — `line.splitn(2, char::is_whitespace).count() == 2` — or verify the hash prefix length (64 hex chars for SHA3-256) directly. |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +### Low |
| 126 | + |
| 127 | +#### L1: `#[allow(dead_code)]` on public methods is misleading |
| 128 | +**File:** `src/core/types.rs:71-80` |
| 129 | + |
| 130 | +```rust |
| 131 | +#[allow(dead_code)] |
| 132 | +pub fn as_str(&self) -> &str { ... } |
| 133 | + |
| 134 | +#[allow(dead_code)] |
| 135 | +pub fn into_inner(self) -> String { ... } |
| 136 | +``` |
| 137 | + |
| 138 | +`dead_code` lint does not fire on `pub` items in a binary crate — only on private/`pub(crate)` items that are not reachable from the public API. These suppressions have no effect and imply the methods are dead when they are actually part of the intended public interface. They're probably leftovers from when these methods were `pub(crate)`. |
| 139 | + |
| 140 | +**Fix:** Remove the `#[allow(dead_code)]` attributes from both methods. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +#### L2: `test_empty_supplied_paths` tests nothing |
| 145 | +**File:** `src/unit_tests.rs:603-619` |
| 146 | + |
| 147 | +```rust |
| 148 | +#[test] |
| 149 | +fn test_empty_supplied_paths() { |
| 150 | + let config = ConfigSettings { ..., supplied_paths: Vec::new() }; |
| 151 | + assert!(config.supplied_paths.is_empty()); |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +This constructs a struct and asserts a property of the literal value used to construct it. No code is exercised. The test comment says "This would normally read from stdin, but we're testing the config behavior" — but it doesn't test config behavior either, only struct construction. |
| 156 | + |
| 157 | +**Fix:** Either delete the test, or replace it with an actual behavioral test (e.g., that `get_required_filenames` with empty `supplied_paths` attempts to read stdin and returns an empty vec when stdin is empty). |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +#### L3: `threshold_millis()` wraps a module-level constant |
| 162 | +**File:** `src/progress.rs:72-74` |
| 163 | + |
| 164 | +```rust |
| 165 | +pub fn threshold_millis() -> u64 { |
| 166 | + PROGRESS_THRESHOLD_MILLIS |
| 167 | +} |
| 168 | +``` |
| 169 | + |
| 170 | +This is a public function that returns a private constant. The only call site is `worker.rs:164`. The indirection adds no encapsulation (the constant's value is fully determined at compile time) and serves no purpose a `pub const` would not serve equally. |
| 171 | + |
| 172 | +**Fix:** Make `PROGRESS_THRESHOLD_MILLIS` `pub(crate)` and reference it directly, or keep the function but note it's an intentional abstraction boundary. |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +#### L4: CLAUDE.md documents `panic = 'abort'` but it is commented out |
| 177 | +**File:** `Cargo.toml:6` |
| 178 | + |
| 179 | +```toml |
| 180 | +[profile.release] |
| 181 | +# panic = 'abort' |
| 182 | +``` |
| 183 | + |
| 184 | +`CLAUDE.md` states: *"Release builds use LTO and panic=abort for size/speed"*. The setting is commented out. The discrepancy means either the documentation or the config is wrong. Without `panic=abort`, the release binary includes unwinding infrastructure, increasing binary size and disabling the size/speed claim. |
| 185 | + |
| 186 | +**Fix:** Either restore `panic = 'abort'` to the release profile, or remove the claim from CLAUDE.md. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +#### L5: Commented-out `readonly` dependency is dead config noise |
| 191 | +**File:** `Cargo.toml:last dependency block` |
| 192 | + |
| 193 | +```toml |
| 194 | +# readonly = "0.2" |
| 195 | +``` |
| 196 | + |
| 197 | +This was removed as part of fixing L6 from the prior review, but the commented-out line was left behind. It provides no value and could confuse someone scanning the dependency list. |
| 198 | + |
| 199 | +**Fix:** Delete the commented-out line. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +## Summary |
| 204 | + |
| 205 | +| Severity | Count | Items | |
| 206 | +|---|---|---| |
| 207 | +| High | 2 | H1 (exit code on errors), H2 (help to stdout) | |
| 208 | +| Medium | 3 | M1 (non-UTF-8 stdin), M2 (dead Write impl), M3 (test fragility) | |
| 209 | +| Low | 5 | L1–L5 above | |
| 210 | +| **Total** | **10** | | |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## Remediation Sequence |
| 215 | + |
| 216 | +### Immediately (correctness) |
| 217 | +1. **H1** — Return non-zero exit code when any file fails to hash |
| 218 | +2. **H2** — Route help text to stderr on error path |
| 219 | + |
| 220 | +### Near-term (correctness + test reliability) |
| 221 | +3. **M1** — Handle non-UTF-8 stdin paths gracefully |
| 222 | +4. **M3** — Fix fragile `split_whitespace` assertion in integration test |
| 223 | + |
| 224 | +### Cleanup (code hygiene) |
| 225 | +5. **M2** — Remove `impl Write for Crc32` |
| 226 | +6. **L1** — Remove `#[allow(dead_code)]` from public methods |
| 227 | +7. **L2** — Delete or replace the no-op `test_empty_supplied_paths` |
| 228 | +8. **L4** — Reconcile `panic = 'abort'` between CLAUDE.md and Cargo.toml |
| 229 | +9. **L5** — Delete commented-out `readonly` line |
| 230 | +10. **L3** — Make `PROGRESS_THRESHOLD_MILLIS` `pub(crate)` and remove the wrapper function |
0 commit comments