fix: wire up optimize_scans in test_encoder's encode_rust()

lilith · lilith · commit 046758729fed · 2026-02-01T09:26:01.000-07:00
encode_rust() was missing .optimize_scans(config.optimize_scans) in the
Encoder builder chain. The Rust scan optimizer was never invoked during
parity comparisons — the encoder always used the fixed 9-scan script.

C's encoder correctly used its scan search, finding simpler scripts
(4-5 scans, no successive approximation) at low quality. This made
Rust appear 1-4% larger at Q10-Q50 when the actual optimizer works
correctly.

With the fix, Max Compression parity is within ±0.4% average at all
quality levels. At low quality (Q10-Q50) Rust now produces smaller
files than C.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -82,18 +82,21 @@ Reproduce: `cargo test --release --test parity_benchmark -- --nocapture`
 | Full Progressive         | 85 |  +0.00% |   0.35% |
 | Full Progressive         | 90 |  +0.08% |   0.34% |
 | Full Progressive         | 95 |  +0.13% |   0.40% |
-| Max Compression          | 75 |  +0.59% |   2.12% |
-| Max Compression          | 85 |  +0.41% |   1.25% |
-| Max Compression          | 90 |  +0.28% |   0.59% |
-| Max Compression          | 95 |  +0.40% |   0.81% |
+| Max Compression          | 55 |  -0.04% |   1.64% |
+| Max Compression          | 65 |  +0.14% |   0.97% |
+| Max Compression          | 75 |  +0.29% |   1.08% |
+| Max Compression          | 85 |  +0.36% |   0.87% |
+| Max Compression          | 90 |  +0.39% |   0.84% |
+| Max Compression          | 95 |  +0.28% |   0.64% |
 
 **Configs:** Baseline = huffman opt only. +Trellis = AC trellis. Full = AC trellis + DC trellis + deringing. Max Compression = Full + `optimize_scans: true`. All others use `optimize_scans: false`. All use `force_baseline: true`.
 
 **Key findings:**
 - With trellis at Q75, Rust produces **smaller** files than C (-0.15% to -0.24%)
 - Without trellis, consistent +0.21% gap from `fast-yuv` color conversion ±1 rounding
 - Without `optimize_scans`, all configs within ±0.25% average, worst-case per-image deviation under 1%
-- With `optimize_scans` (Max Compression), within +0.6% average — different scan search heuristics
+- With `optimize_scans` (Max Compression), within ±0.4% average, per-image max ~1.6%
+- Rust scan optimizer sometimes finds different local optima than C (different Al/freq split choices)
 - Visual quality equivalent (SSIMULACRA2 and Butteraugli verified)
 
 **Mode explanations:**
@@ -248,24 +251,26 @@ cascade through DC differential encoding. Both produce visually identical images
 
 ### Known Issues / Active Investigations
 
-#### File Size Gap with optimize_scans - FIXED ✅ (Dec 2025)
+#### File Size Gap with optimize_scans - FIXED ✅ (Feb 2026)
 
-**Original symptom:** Rust produced ~2-4% larger files with `optimize_scans` enabled
-because refinement scans were trial-encoded independently, producing garbage sizes.
+**Original symptom:** Rust produced ~1-4% larger files with `optimize_scans` at low
+quality levels (Q10-Q50), with kodim23 showing +3.37% at Q40.
 
-**Fix:** `ScanTrialEncoder` (`src/scan_trial.rs`) now encodes all 64 candidate scans
-sequentially with proper state tracking between scans. Each scan also builds its own
-optimal Huffman table via two-pass encoding (count + encode), matching C mozjpeg's
-per-scan Huffman behavior.
+**Root cause:** `encode_rust()` in `test_encoder.rs` was not passing `optimize_scans`
+to the `Encoder` builder chain. The Rust scan optimizer was never called — the encoder
+always used the fixed 9-scan script regardless of the `optimize_scans` flag. The C
+encoder correctly used its scan search to find simpler scripts (4-5 scans, no SA) at
+low quality.
 
-**Result:** Max Compression mode matches C mozjpeg within ±0.15% at all quality levels.
-At Q75, Rust produces smaller files than C.
+**Fix:** Added `.optimize_scans(config.optimize_scans)` to `encode_rust()` builder chain
+in `src/test_encoder.rs`.
 
-**Note (Feb 2025):** Previous results showed ±2.2% because the C test harness didn't
-explicitly disable `optimize_scans`. C mozjpeg's `JCP_MAX_COMPRESSION` default enables
-`optimize_scans=TRUE`, causing `jpeg_simple_progression()` to call
-`jpeg_search_progression()` and generate an optimized ~12-scan script, while Rust used
-the fixed 9-scan script. All C encoder wrappers now explicitly control `optimize_scans`.
+**Result:** Max Compression within ±0.4% average at all quality levels. At low quality
+(Q10-Q50) Rust is now **smaller** than C. Per-image max deviation ~1.6% (from different
+local optima in scan search, not a bug).
+
+**Previous fixes (Dec 2025):** ScanTrialEncoder sequential encoding + per-scan Huffman.
+**Previous note (Feb 2025):** C test harness optimize_scans control.
 
 #### AC Refinement Decoder Errors - FIXED ✅ (Dec 2024)
 
diff --git a/scans-lq.md b/scans-lq.md
@@ -1,83 +1,46 @@
-# Investigation: optimize_scans divergence at low quality
+# Investigation: optimize_scans divergence at low quality — RESOLVED
 
-## Problem
+## Root Cause
 
-The `Max Compression` config (`optimize_scans: true`) shows increasing file size gap
-between Rust and C at low quality levels. At Q40, it exceeds our 1% average / 3%
-per-image thresholds:
+`encode_rust()` in `src/test_encoder.rs` was missing `.optimize_scans(config.optimize_scans)`
+in the `Encoder` builder chain. The Rust scan optimizer was **never called** — the encoder
+always used the fixed 9-scan script regardless of the `optimize_scans` config flag.
 
-| Q  | Avg Delta | Max Dev | Worst Image |
-|----|-----------|---------|-------------|
-| 40 | +1.11%    | 3.37%   | kodim23, kodim09 |
-| 50 | +0.77%    | 3.13%   | kodim23 |
-| 55 | +0.75%    | 2.82%   | kodim23 |
-| 65 | +0.70%    | 2.74%   | |
-| 75 | +0.59%    | 2.12%   | |
-| 85 | +0.41%    | 1.25%   | |
-| 90 | +0.28%    | 0.59%   | |
-| 95 | +0.40%    | 0.81%   | |
+Meanwhile, the C encoder correctly passed `optimize_scans` via FFI, so C's scan search
+found simpler, more efficient scripts at low quality (4-5 scans without successive
+approximation), while Rust always used the default 9-scan SA script.
 
-Without `optimize_scans`, all configs are within ±0.7% average even at Q40.
-The gap is strictly in the scan optimization search.
+## Evidence
 
-## Context
+Before fix (R=Rust optimize_scans, C=C optimize_scans):
+- R(optsc) == R(fixed) at ALL quality levels — Rust optimizer was never invoked
+- C correctly found smaller scripts at low Q (C saves 4.6% at Q10 vs fixed script)
 
-`optimize_scans` tries multiple progressive scan configurations and picks the
-smallest. Both Rust and C implement this, but their scan search heuristics may
-differ. At low quality, more coefficients are quantized to zero, giving the
-optimizer a larger search space where different heuristics produce different
-local optima.
+After fix — Rust scan optimizer runs and finds similar scripts as C:
+```
+  Q  R(optsc) C(optsc)   Δopt%
+ 10    183834   184696  -0.47%   (was +4.18%)
+ 20    354621   357038  -0.68%   (was +2.40%)
+ 30    507224   509658  -0.48%   (was +1.54%)
+ 40    643993   645968  -0.31%   (was +1.11%)
+ 50    769538   771715  -0.28%   (was +0.77%)
+ 75   1263157  1259526  +0.29%   (was +0.59%)
+ 85   1766757  1760481  +0.36%   (was +0.41%)
+ 95   3218288  3209303  +0.28%   (was +0.40%)
+```
 
-## What to investigate
+At low quality, Rust is now **smaller** than C (the scan optimizer works well).
 
-1. **Map the full curve.** Run Max Compression at Q10, Q20, Q25, Q30, Q35, Q40,
-   Q45, Q50 on the Kodak corpus. Add a temporary `#[test]` or `#[ignore]` test
-   to `parity_benchmark.rs` that only runs Max Compression across these qualities
-   and prints per-image detail for each. Determine where the gap plateaus.
+## Fix
 
-2. **Per-image scan counts.** For the worst images (kodim23, kodim09), compare
-   the number of scans chosen by Rust vs C at Q40. Use `count_scans()` (pattern
-   in `corpus_comparison.rs`). If scan counts differ, the search is finding
-   fundamentally different scan scripts.
+One-line fix in `src/test_encoder.rs:134`:
+```rust
+.optimize_scans(config.optimize_scans)
+```
 
-3. **Compare scan scripts directly.** Parse the SOS markers from both outputs
-   and print `(Ns, comps, Ss, Se, Ah, Al)` for each scan. Pattern is in
-   `corpus_comparison.rs::print_scan_details()`. Identify which scans differ.
+## Remaining Observations
 
-4. **Trace the scan trial encoder.** The Rust implementation is in
-   `src/scan_trial.rs`. The C implementation calls `jpeg_search_progression()`
-   in `jcmaster.c`. Compare:
-   - How many candidate scans are evaluated
-   - The cost function (file size estimation)
-   - The greedy selection order
-   - Whether the trial encoder's Huffman table estimation matches C's
-
-5. **Check if C uses `trellis_freq_split` during scan search.** C mozjpeg has
-   `trellis_freq_split = 8` which splits AC trellis into low/high frequency
-   passes. If C's scan optimizer accounts for this split during trial encoding
-   but Rust doesn't, that could explain the gap at low quality where the split
-   matters more.
-
-6. **Kodim23 specifically.** This image consistently has the worst deviation.
-   It's a landscape with lots of sky gradient + sharp foreground detail.
-   Encode it standalone at Q40 with both, diff the scan scripts, and check
-   if one finds genuinely smaller output or if it's a Huffman table estimation
-   error in the trial encoder.
-
-## Key files
-
-- `src/scan_trial.rs` — Rust scan trial encoder
-- `src/progressive.rs` — Rust progressive scan generation
-- `tests/parity_benchmark.rs` — benchmark test (add exploration tests here)
-- `tests/corpus_comparison.rs` — has `count_scans()` and `print_scan_details()`
-- C: `jcmaster.c` → `jpeg_search_progression()`
-- C: `jcphuff.c` → trial encoding for scan cost estimation
-
-## Acceptance criteria
-
-- Understand whether the gap is from different scan scripts or different
-  file sizes for the same scan script
-- If different scripts: determine if Rust's choice is suboptimal or just different
-- If same scripts: the gap is in entropy coding, not scan search — investigate
-  per-scan Huffman table differences
-- Document findings, decide whether to fix or accept and adjust thresholds
+Some images still show Rust choosing different scripts than C (different Al levels
+or frequency splits). This is expected — the scan search is a greedy heuristic and
+can find different local optima. The per-image max deviation is ~1.6% at Q55, which
+is within acceptable range.
diff --git a/src/test_encoder.rs b/src/test_encoder.rs
@@ -135,6 +135,7 @@ pub fn encode_rust(rgb: &[u8], width: u32, height: u32, config: &TestEncoderConf
         .subsampling(config.subsampling)
         .progressive(config.progressive)
         .optimize_huffman(config.optimize_huffman)
+        .optimize_scans(config.optimize_scans)
         .trellis(trellis)
         .overshoot_deringing(config.overshoot_deringing)
         .force_baseline(config.force_baseline)