Skip to content

feat(formula-plane): add opt-in span evaluation runtime#98

Open
PSU3D0 wants to merge 127 commits into
mainfrom
formula-plane/span-shift-structural-op
Open

feat(formula-plane): add opt-in span evaluation runtime#98
PSU3D0 wants to merge 127 commits into
mainfrom
formula-plane/span-shift-structural-op

Conversation

@PSU3D0
Copy link
Copy Markdown
Owner

@PSU3D0 PSU3D0 commented May 13, 2026

Summary

This is the main FormulaPlane / span evaluation project PR.

It introduces an opt-in span runtime for Formualizer: repeated formula families can be represented, dirtied, shifted, and evaluated as compact spans instead of eagerly materializing every formula cell as a standalone graph vertex. The stable dependency-graph path remains the default for 0.6; span evaluation is explicitly gated behind opt-in configuration across the public surfaces.

Diff scale: ~125 commits, 217 files, ~56k insertions.

Major areas

FormulaPlane runtime and promotion

  • Added the FormulaPlane runtime, span store, span scheduler, template canonicalization, placement pipeline, dependency summaries, region index, diagnostics, and promotion counters.
  • Added authoritative experimental span mode while preserving default FormulaPlaneMode::Off behavior.
  • Added canonical/template support for copied formula families, parameterized literals, constant-result broadcast, range-argument precedents, whole-axis precedents, cross-sheet reads, and function dependency contracts.
  • Added function-family promotion support for arithmetic, criteria aggregates, lookup/index families, whole-column/whole-axis patterns, and affine literal families.
  • Added compact affine literal binding encodings for exact integer-like row/column/rect progressions, with dictionary fallback for non-integer or irregular literals.

Dirtying, demotion, and structural edits

  • Added span dirty projection and bounded dirty-domain preservation.
  • Added structural handling for row/column insert/delete, affected-region scoping, tail precision, and structural span shifting.
  • Added conservative demotion for unsupported edits/shapes, per-cell writes inside spans, internal dependency chains, volatile/dynamic formulas, and other unsupported families.
  • Kept internal chains/running balances on the legacy graph path for this release.

Workbook, bindings, and opt-in surfaces

  • Added opt-in span evaluation wiring through Rust workbook config, Python, WASM/JS, and C FFI.
  • Preserved default stable semantics: users do not get FormulaPlane unless they request it.
  • Added workbook changelog dirtying coverage for promoted spans.

Load/ingest work

  • Added sparse initial ingest paths for JSON, Umya, and Calamine-backed loading.
  • Kept Calamine dependency publishable via crates.io calamine = "0.35".
  • Preserved a migration seam for future Calamine formula-record streaming once the upstream API is available in a crates.io release.
  • Calamine structured table metadata remains a known gap for s019/s020; Umya remains the fuller XLSX compatibility path for those cases.

Benchmark corpus and tooling

  • Added the scenario corpus/harness and FormulaPlane Off/Auth parity tooling.
  • Added backend selection for corpus probing.
  • Added structural engine-stat invariants for span counts, graph formula vertices, graph edges, and AST roots.
  • Added affine literal scenarios s081s086 covering perfect affine rows, column legacy behavior, outliers, periodic outliers, gaps, and non-integer dictionary fallback.

Docs and release posture

  • Replaced internal working docs with public docs-site/README/CHANGELOG guidance.
  • Added docs for FormulaPlane span evaluation and large workbook performance.
  • Version bump is intentionally left for a follow-up release commit after merge.

0.6 release posture

  • FormulaPlane/span evaluation is experimental and opt-in.
  • Default workbook behavior remains the stable dependency graph.
  • Unsupported span shapes fall back to legacy graph evaluation.
  • Internal dependency chains and array-literal formula families are not span-promoted in 0.6.
  • Calamine formula-record streaming is deferred until upstream Calamine publishes the API.

Validation

Final gates run locally:

cargo fmt --all -- --check
cargo clippy -p formualizer-eval --all-targets -- -D warnings
cargo clippy -p formualizer-workbook --all-targets --features json,umya,calamine -- -D warnings
cargo clippy -p formualizer-bench-core --features formualizer_runner,ironcalc_runner --all-targets -- -D warnings
cargo test -p formualizer-workbook --all-targets --features json,umya,calamine
cargo test -p formualizer-eval --release --no-run
cd docs-site && bun run types:check
cargo package -p formualizer-workbook --allow-dirty --no-verify

Final corpus reruns:

target/scenario-corpus/final-pr-s001-s035-small-umya-20260512
target/scenario-corpus/final-pr-s001-s035-medium-umya-20260512
target/scenario-corpus/final-pr-s001-s035-small-calamine-20260512
target/scenario-corpus/final-pr-s001-s035-medium-calamine-20260512
target/scenario-corpus/final-pr-affine-small-calamine-20260512

Calamine corpus excludes known s019/s020 structured table metadata cases.

PSU3D0 added 30 commits April 29, 2026 21:52
PSU3D0 added 26 commits May 5, 2026 22:15
Adds --phase-timeout-ms with scale-aware defaults (small=5s, medium=15s,
large=60s) and a watchdog thread that flips a cancellation flag.

Limitations (documented for future fix):
- Cancellation only honored at coarse evaluate_all checkpoints. In-flight
  scalar evals run to completion before the cancel flag is read.
- Pre-eval phases (fixture build, load, structural-op demote+materialize)
  have NO cancel hooks. Scenarios that hang in those phases (e.g. s035
  column-delete demotion at medium scale) will still hang the runner.
  Subprocess-per-tuple is the proper fix when batch reliability matters.

Watchdog uses condvar with timeout so it returns promptly when eval
finishes early; no thread accumulation across tuples.
…structural ops

Fixes the structural-op blowup in column-insert/delete that surfaced at
medium scale (s034 edit_3 = 410s, s035 Auth never finished after 1400s).

Two surgical changes anchored in
docs/design/formula-plane/dispatch/structural-op-blowup-investigation.md.

## Change 1: CsrMutableEdges::update_coord becomes O(1)

Before: `self.vertex_ids.iter().position(|&id| id == vertex_id.0)` was a
full linear scan across the edge-cache vertex-id array per moved vertex.
For a sheet with 50k formula vertices and a column-insert moving 50k of
them, that's 2.5e9 integer comparisons per structural edit.

After: side index `vertex_pos: FxHashMap<u32, usize>` maintained at every
call site that mutates `vertex_ids` (constructors new/with_coords/
build_from_adjacency, mutators add_vertex/add_vertices_batch, rebuild).
update_coord is now O(1) hash lookup with debug_assert that the position
matches.

## Change 2: ReferenceAdjuster::adjust_ast_if_changed avoids debug-string compare

Before: VertexEditor::insert_columns and ::delete_columns ran
`format!("{ast:?}") != format!("{adjusted:?}")` for every formula
vertex in the workbook to detect whether the adjusted AST actually
changed. Each comparison allocated two debug-rendered strings.

After: new `adjust_ast_if_changed` traverses the AST and returns
Option<ASTNode>, only allocating an adjusted AST if at least one
reference actually changed. Compares ReferenceType via PartialEq
(verified derived). For unchanged formulas the cost is now traversal
only, no allocation.

Together these explain the s034 variance: edit_3 inserts before column
A, which means EVERY relative `A{r}` reference shifts. The combination
of O(M*V) edge-coord updates + N debug-string allocations + N AST
clones was the 410-second hot loop.

## Bundled correctness fix

CsrMutableEdges `batch_mode: bool` -> `batch_depth: usize` counter.
With the bool, nested begin_batch/end_batch pairs (e.g. when a
sheet-level operation calls a vertex-editor batch internally) would
have the inner end_batch flip the bool false, causing the outer
operations to no longer batch. Counter semantics correctly track
nesting depth and only fire rebuild when the outermost end_batch lands.

## Perf measurements (medium scale, 10k rows)

s034-family-with-column-insert Auth (insert column at positions [3,2,5,1,4]):
  edit_0: 25,386 ms -> 24,263 ms  (demotion + 50k materialization, unchanged)
  edit_1:    264 ms ->     85 ms
  edit_2:    175 ms ->     60 ms
  edit_3: 410,333 ms ->    186 ms  (~2200x faster)
  edit_4:    247 ms ->     62 ms

s035-family-with-column-delete Auth (delete column 7 x5):
  edit_0:        N/A -> 45,090 ms  (was hanging; now completes)
  edit_1: hung      ->     15 ms
  edit_2: hung      ->     19 ms
  edit_3: hung      ->     21 ms
  edit_4: hung      ->     17 ms
  recalc all:    -- ->  <1 ms

Off-mode times unchanged (no regression).

The first edit (which does FormulaPlane span demotion + ingest of 30k-50k
formulas) is now the dominant cost. Demotion is a separate concern and
not in scope here; tracked for future tuning.

## Tests added (4)

- delta_edges.rs: update_coord_uses_vertex_position_index
  20k vertices, update last 5k coords; release-mode <50ms; verifies
  vertex_pos consistency.

- reference_adjuster.rs: adjust_ast_if_changed_returns_none_for_unaffected_column_insert
  =A1+1 with insert-before-col-3 returns None.

- reference_adjuster.rs: adjust_ast_if_changed_returns_adjusted_for_insert_before_a
  =A1+1 with insert-before-col-0 returns Some with reference shifted to B1.

- formula_plane_structural.rs:
    formula_plane_authoritative_repeated_column_insert_after_demotion_15k_vertices_stays_linear
  5k rows x 3 formula columns, runs the s034 insert sequence,
  verifies correctness across rows 1/2500/5000 after every edit,
  asserts release-mode timing budgets (first <10s, others <1s,
  insert-before-col-1 specifically <1s).

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus s034/s035 medium auth+off                           completes;
                                                                 final invariants pass.

## Out of scope (separate dispatches)

- Demotion-phase cost (edit_0 still ~25-45s for 30-50k formula
  materialization). The bulk_set_formulas_with_plans + ingest pipeline
  per-vertex cost is the remaining first-edit hot spot.
- vertices_in_sheet linear scan (use sheet_indexes) — linear, not quadratic.
- Tombstoned vertex inclusion in vertices_in_sheet — separate concern.
- Per-row volatile/error span overhead at scale (s021/s025). Plan exists
  at docs/design/formula-plane/dispatch/small-domain-span-overhead.md
  for next dispatch.
Fixes per-row span overhead surfaced by s021 (16x slower) and s025
(3.3x slower) at medium scale. Implements the small-domain promotion
gate from
docs/design/formula-plane/dispatch/small-domain-span-overhead.md.

## Root cause (verified, not the PM's initial framing)

PM's initial hypothesis — 'decline single-cell families' — turned out
to already be implemented: detect_domain rejects analyses.len() < 2
(placement.rs:467-472) and converts to legacy via mark_all_legacy.

The actual issue was small MULTI-cell families:
- s021 medium: 1000 spans of only 7 cells each (=A{r}*2 rows
  separated by volatile RAND/TODAY/NOW gaps).
- s025 medium: 100 spans of only 99 cells each (=A{r}*2 rows
  separated by per-100th =A{r}/0 errors).

The FormulaPlane runtime has fixed per-span cost (template intern,
scheduler edge insertion, per-task setup including AST relocatability
revalidation, current_sheet.to_string allocation, fresh SpanEvaluator
construction). For 7-cell spans this fixed cost dwarfs any savings
vs the legacy graph path. Even 99-cell spans don't amortize it
(measured 3.3x slower).

## Fix

Add MIN_PROMOTED_NON_CONSTANT_SPAN_CELLS = 100 threshold in
place_analyzed_family (formula_plane/placement.rs).

Applied only after detect_domain succeeds and before any template
intern / read-summary / span insert work, so doomed-small candidates
fall through to legacy with zero wasted promotion overhead.

Constant-result spans bypass the threshold because their broadcast
path (eval-once, broadcast-to-N-placements) amortizes regardless of
cell count; this preserves s013's 161x recalc win for SUMIFS-over-
constant-criteria families and similar constant LET/LAMBDA wins.

New PlacementFallbackReason::SmallDomain and PlacementDomain::cell_count()
helper.

## Perf measurements (medium scale, 10k rows)

s021-volatile-functions-sprinkled:
  recalc Auth/Off: 68.28ms / 4.27ms = 16.00x  ->  4.27ms / 4.57ms = 0.93x
  span_count Auth: 1000 -> 0 (small =A*2 runs demote; volatiles already legacy)

s025-errors-propagating-through-family:
  recalc Auth/Off: 1.65ms / 0.50ms = 3.30x  ->  0.46ms / 0.49ms = 0.94x
  span_count Auth: 100 -> 0 (99-cell runs demote; error rows already singleton legacy)

Preserved (no regressions):
  s006-rect-family-10cols      Auth/Off: 6.98 / 28.73 ms (still ~4x faster)
  s007-fixed-anchor-family     Auth/Off: 0.78 / 4.21 ms  (still ~5x faster)
  s008-two-anchored-families   Auth/Off: 1.54 / 7.89 ms  (still ~5x faster)
  s013-sumifs-constant         Auth/Off: 0.84 / 135.59ms (still ~161x faster
                                                          via constant broadcast)

All families above the threshold retain promotion. All constant-result
families retain promotion regardless of size.

## Tests added (3)

- formula_plane_authoritative_demotes_small_non_constant_domains
  100-row s021-shape: volatile rows + =A*2 7-row runs.
  Asserts: active_span_count == 0, all 100 formulas materialized in graph,
  =A*2 cells produce correct values.

- formula_plane_authoritative_demotes_99_cell_non_constant_runs
  200-row s025-shape: =A*2 with =A{r}/0 every 100th row.
  Asserts: active_span_count == 0, all formulas in graph, error cells
  show #DIV/0!, others multiplied correctly.

- formula_plane_authoritative_promotes_100_cell_non_constant_run
  100 contiguous =A{r}*2 rows.
  Asserts: active_span_count == 1 (threshold is inclusive at 100).

The existing constant-result test (formula_plane_authoritative_constant_
sumifs_family_promotes_via_broadcast) passes unchanged, validating the
exemption.

## Tests updated

Several existing formula-plane ingest/shadow/structural/span_eval/
placement tests previously used 2-3 cell non-constant families to verify
mechanical span-creation behavior. Updated those to use 100-cell families
where the test intent is active-span mechanics. Constant-result
small-span tests remain small (the exemption preserves them).

Files: tests/formula_plane_ingest_shadow.rs,
tests/formula_plane_structural.rs, formula_plane/placement.rs (test mod),
formula_plane/span_eval.rs (test mod). Helper functions row_run_candidates
and col_run_candidates added in placement.rs and span_eval.rs test mods
to reduce repetition.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus medium s021/s025/s006/s007/s008/s013 auth+off       all final
                                                                 invariants pass.

## Threshold rationale

100 chosen because:
- 7 cells (s021) clearly bad.
- 99 cells (s025) measurably bad.
- Below 100 the per-span fixed cost dominates.
- Above 100 the per-cell amortization works.

Future tuning: revisit after the next medium-scale corpus baseline once
other corpus-driven fixes land. The threshold is a single named constant,
easy to adjust.

## Open follow-ups (separate dispatches)

- Per-span scheduler/evaluator overhead (current_sheet.to_string,
  fresh SpanEvaluator per work item, double placement vector
  materialization, per-task AST relocatability revalidation). Real but
  orthogonal; with the threshold in place these become less important
  because we no longer create the small spans that exposed them.
- Volatile authority canonical support — out of scope; would need
  careful guard against vacuous constant-result classification of
  no-read volatiles.
Fixes s036 Auth recalc 10-18x slower than Off. Single-line removal of
record_formula_plane_structural_change(StructuralScope::Sheet) from
Engine::rename_sheet (eval.rs:1644).

Anchored in
docs/design/formula-plane/dispatch/sheet-rename-dirty-scope.md.

## Root cause

Sheet rename in Excel changes the display name string only. The engine
preserves SheetId across rename (sheet_registry.rs:78-108). All known
sheet references are stored in arena as SheetKey::Id(id), not the
display name (data_store.rs:445-457 for cells, :470-482 for ranges).
ASTs are reconstructed via the current registry name lookup
(:660-668, :682-690).

Therefore: a sheet rename does not change any actual cell values or
dependency identities. References still resolve to the same cells.
The legacy graph correctly handles this — Off mode finishes recalc in
0.2ms because mark_vertex_dirty does not propagate to dependents and
the only dirtied vertices are value cells which get filtered out by
get_evaluation_vertices.

Auth mode was paying ~3ms per rename because record_formula_plane_
structural_change(StructuralScope::Sheet(sheet_id)) recorded
RegionPattern::whole_sheet(sheet_id), which the consumer-read index
correctly matched against every span reading from that sheet. The
dirty closure then projected whole-sheet through the affine
projection rule onto the whole result region of any consuming span,
triggering whole-span recompute.

For s036 (Sheet1 has one 10k-cell span reading from DataA + DataB):
each rename of DataA or DataB triggered a 10k-placement re-eval of
the Sheet1 span. The values were unchanged afterward.

## Fix

One line removed at eval.rs:1644. Comment block added explaining the
SheetId-preservation invariant.

Path before:
  rename_staged_formula_sheet
  vertices_in_sheet().mark_dirty (legacy bookkeeping; values filtered)
  record_formula_plane_structural_change(Sheet)  <- removed
  mark_topology_edited

## Perf measurements (medium scale, 10k formulas / 30k vertices)

s036-multi-sheet-with-sheet-rename:
  Off recalc 0..3:  0.67, 0.18, 0.34, 0.36 ms  (4 rename cycles)
  Auth recalc 0..3: 0.11, 0.19, 0.09, 0.07 ms  (Auth now FASTER than Off)
  Off recalc_4: 0.33 ms  (value edit; unchanged behavior)
  Auth recalc_4: 0.18 ms  (value edit; correct dirty propagation)

  Pre-fix Auth: 2.75-3.56 ms per rename cycle (10-18x worse than Off).
  Post-fix Auth: 0.07-0.19 ms (better than Off because Auth has 1 span
  while Off has 30k graph vertices to schedule).

  result.computed_vertices == 0 after each rename (verified by test).

## Tests added (3)

- formula_plane_authoritative_sheet_rename_is_metadata_only_for_cross_sheet_span
  100-row cross-sheet span. Renames DataA forward and back. Asserts
  result.computed_vertices == 0 after each rename, sampled values
  unchanged, span count preserved.

- formula_plane_authoritative_value_edit_after_sheet_rename_dirties_bounded_span_work
  After rename, a single cell value edit produces bounded span work
  (>= 1 placement re-evaluated) and only the affected output row
  changes. Verifies dirty propagation is preserved for actual edits.

- formula_plane_authoritative_sheet_rename_preserves_sheet_id_read_summaries
  Read summaries remain SheetId-keyed across rename. consumer_read_entries
  count preserved. Edit on the renamed sheet correctly dirties only the
  expected output cell.

## What was NOT changed (out of scope)

- StructuralScope::Sheet still used for row/col insert/delete (eval.rs:3763,
  3789, 3819, 3849) — those legitimately need it because references shift.
- StructuralScope::RemovedSheet path unchanged (eval.rs:5470-5493).
- StructuralScope::AllSheets path unchanged (eval.rs:5495-5498).
- Legacy mark_vertex_dirty loop on the renamed sheet kept (eval.rs:1638-1643).
  In s036 it produces no formula evaluation work because get_evaluation_
  vertices filters value vertices out. Removing it would be a broader
  legacy behavior change requiring its own audit.
- Arrow store sheet rename, graph rename_sheet, staged-formula rename, and
  topology edit mark all kept.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus medium s036 auth+off                                final invariants pass

## Open items (separate dispatches)

- s036 fixture has DATA_ROWS=1000 not 10000. Doesn't affect this fix.
  Worth fixing for consistency; separate trivial commit.
- Span merging across sheet-display-name changes: spans retain canonical
  keys with old explicit names. Future formulas with new names may not
  merge. Out of scope; tracked.
- Whole-column reference cost (s026 4.8s recalc) — separate dispatch,
  design memo at docs/design/formula-plane/dispatch/whole-column-references.md.

Memo committed alongside this change for future reference.
Lifts the FormulaPlane rejection of whole-axis (whole-column) references
in dependency analysis, canonical labels, and projection construction.
Whole-column reads now produce 'WholeColumnRange' projections that emit
RegionPattern::WholeCol read regions. Constant-result classification
treats whole-axis as placement-invariant, so absolute whole-column
formulas like =SUM($A:$A) enter the eval-once-broadcast fast path.

Anchored in
docs/design/formula-plane/dispatch/whole-axis-promotion.md.

## Root cause

s026-whole-column-refs-in-50k-formulas had span_count=0 in Auth mode
because dependency_summary rejected any range with AxisRef::WholeAxis
upstream of placement, and the parallel arena-canonical labels also
rejected it. Lifting both rejections plus adding a source-aware
projection rule lets the existing constant-result broadcast path apply
to whole-column formulas with absolute axes.

The fix touches six call sites that all needed updating in lockstep:
- template_canonical reject reason
- arena/canonical reject labels
- dependency_summary reject_non_finite_range
- dependency_summary axis_kinds_match
- dependency_summary is_constant_result helper
- producer DirtyProjectionRule + AxisProjection

Without all six, promotion is path-dependent or projection construction
fails after the summary accepts the precedent.

## Design

Scope: whole-COLUMN only. Whole-row deferred (multi-row whole-row
intervals would require new RegionPattern::WholeRowInterval and is not
driven by current measurements).

New variant DirtyProjectionRule::WholeColumnRange { col_start, col_end }.
Existing PrecedentPattern::Range(AffineRectPattern) reused; AxisRef
already has a WholeAxis variant.

New method read_regions_for_result returns Vec<RegionPattern> instead
of a single region. AffineCell/AffineRange wrap their existing single
result; WholeColumnRange emits one RegionPattern::WholeCol per source
column. Projected column count bounded at 256 to avoid pathological
$A:$XFD cases (rejected with UnsupportedAxis).

Existing read_region_for_result kept for backward compatibility with
callers that expect a single region; returns UnsupportedAxis for
WholeColumnRange.

is_constant_projection at placement.rs and is_constant_result at
dependency_summary.rs treat AxisRef::WholeAxis as placement-invariant
(it represents the entire column regardless of where the formula sits).
RelativeToPlacement remains non-constant. Open/unsupported defensive
default to non-invariant.

Composition with existing precedent kinds:
- =SUM($A:$A): one WholeColumnRange precedent. Constant. Broadcast.
- =SUM($A:$A) - A{r}: two precedents (whole-col + relative cell).
  Mixed → non-constant. Per-placement eval. Whole-col read region
  still in summary; dirty propagation correct.
- =SUMIFS($B:$B, $A:$A, "Type1"): two whole-col precedents, both
  constant. Broadcast.
- =SUMIFS($B:$B, $A:$A, A{r}): two whole-col + one relative.
  Non-constant.
- Cross-sheet =SUM(DataA!$A:$A): emits whole-col on DataA's sheet_id.

Negative cases preserved:
- $A$1:$A (open-ended) still rejected (OpenRangeUnsupported).
- =$A:$A top-level still rejected (not in supported function-arg
  context).
- A:$A (mixed endpoint kinds) still rejected.
- ROW($A:$A) still rejected (ROW not in is_known_static_function).
- Whole-row $1:$1 explicitly rejected in this patch (deferred).
- Internal-dependency guard preserved (formula in column A reading
  $A:$A still falls back to legacy).

VLOOKUP/MATCH NOT added to is_known_static_function in this patch.
Independent semantic review needed; separate dispatch.

## Perf measurements

s026-whole-column-refs-in-50k-formulas medium (10k rows, =SUM($A:$A) - A{r}):
  Off  first 4681ms  recalc 4810ms  spans 0
  Auth first   47ms  recalc 1678ms  spans 1   (99x first / 2.86x recalc)

The recalc 2.86x speedup is for the mixed (non-constant) shape; per-
placement eval still required. Pure constant whole-col shape gets the
full broadcast benefit:

repro_whole_col_vs_finite (interactive() mode, 10k rows):
  =SUM($A:$A)        Off recalc 4854ms  Auth recalc    1.77ms  (2742x faster)
  =SUM($A$1:$A$N)  Off recalc 2415ms  Auth recalc    0.79ms  (3057x baseline)

Both whole-column constant-result and finite-range constant-result now
use the same broadcast path with comparable performance.

## Tests added

dependency_summary:
  - accepts_absolute_whole_column_sum (FormulaClass::StaticPointwise,
    constant-result == true)
  - mixed_whole_column_minus_relative_is_non_constant
  - relative_whole_column_a_a_is_non_constant
  - rejects_open_range_whole_column
  - rejects_top_level_whole_column
  - rejects_mixed_absolute_relative_endpoints

template_canonical:
  - whole_axis_no_longer_unsupports_authority_labels
  - open_range_still_unsupports_authority_labels
  - whole_axis_serializes_in_canonical_key

arena_canonical:
  - whole_column_range_no_longer_sets_reject_whole_axis
  - open_range_still_sets_reject_open_range

producer:
  - whole_column_range_read_regions_emit_whole_cols (single + multi)
  - whole_column_range_rejects_above_256_column_threshold
  - whole_column_dirty_projection_dirties_whole_result_on_intersection
  - whole_column_dirty_projection_no_intersection_outside_column

placement:
  - constant_whole_column_family_promotes_to_one_constant_span
  - mixed_whole_column_minus_relative_promotes_to_non_constant_span
  - sumifs_constant_criteria_whole_column_family_promotes
  - cross_sheet_whole_column_family_targets_data_sheet_id
  - whole_row_family_does_not_promote (negative)

ingest_pipeline:
  - compute_read_projections_accepts_whole_column
  - compute_read_projections_rejects_top_level_whole_column
  - compute_read_projections_rejects_open_ended_range
  - compute_read_projections_rejects_whole_row

formula_plane_structural (end-to-end):
  - 200-row =SUM($A:$A) family promotes, evaluates correctly,
    recalculates correctly after col-A edit
  - 200-row =SUM($A:$A) - A{r} family promotes as non-constant,
    per-row values correct
  - cross-sheet 200-row =SUM(DataA!$A:$A) family recalcs after
    DataA edit

## Tests updated

- Existing dependency_summary whole-axis rejection test updated to new
  behavior: function-argument whole-column accepted, top-level still
  rejected.
- FP8 ingest parity test kept passing by aligning arena whole/open
  range behavior with template canonical labels.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass

probe-corpus medium s026                                         spans 0 -> 1
                                                                 first 99x faster
                                                                 recalc 2.86x faster

repro_whole_col_vs_finite                                        constant whole-col
                                                                 case 2742x faster
                                                                 finite case unchanged
                                                                 mixed case promotes
                                                                 (perf parity with
                                                                 finite-range mixed
                                                                 deferred to SUM CSE)

s007/s013 corpus (constant-result spans)                         no regression

## Out of scope (separate dispatches)

- Whole-row promotion ($1:$1 etc).
- VLOOKUP/MATCH in is_known_static_function.
- SUM aggregate cache (CSE) for mixed shapes like =SUM($A:$A) - A{r}.
  This shape promotes but each placement still re-evaluates the whole-
  column SUM. Phase 1 of the whole-column-references memo would unlock
  another large speedup here.
- Effect 1 from whole-column-references memo: the legacy 2x tax for
  whole-column resolution (Off mode $A:$A is 2x slower than $A$1:$A$N).
  Separate small investigation.

Memos for both committed alongside this change.
…refs

Adds a snapshot-keyed final-result cache for used_rows_for_columns and
used_cols_for_rows. Whole-column references like =SUM($A:$A) need the
used-row extent resolved per call, which currently runs
formula_row_bounds_for_columns every time. That helper scans every
indexed vertex in the queried column range and filters by formula kind.

Anchored in
docs/design/formula-plane/dispatch/whole-column-legacy-tax.md.

## Root cause

For 10k formulas of `=SUM($A:$A)` over a column with 10k input value
vertices: each formula triggers used_rows_for_columns("Sheet1", 1, 1)
which calls formula_row_bounds_for_columns. That helper does
get_vertex_kind() on every indexed vertex in column A — 10k checks per
call. With 10k formulas, that's 100M vertex-kind checks per recalc.

The Arrow used-row bounds cache (row_bounds_cache at eval.rs:349) hits
correctly after the first formula, but the wrapper still calls
formula_row_bounds_for_columns to preserve the union semantics
(Arrow extent OR formula coordinates in unmaterialized rows).

Finite-range references like `=SUM($A$1:$A$10000)` skip the entire
used_rows_for_columns path because all four bounds are present at the
parser AST level (eval.rs:9443-9451).

## Fix

New UsedAxisBoundsCache struct with two FxHashMaps:
  row_bounds_by_col_span: (SheetId, start_col, end_col) -> Option<(u32, u32)>
  col_bounds_by_row_span: (SheetId, start_row, end_row) -> Option<(u32, u32)>

Wrapped in Engine::used_axis_bounds_cache: RwLock<Option<...>>.

used_rows_for_columns flow:
1. Resolve sheet_id (O(1) HashMap).
2. Load snapshot_id.
3. Read-lock check cache for (sheet_id, start_col, end_col).
4. On hit: return cached Option immediately.
5. On miss: run existing union logic (Arrow + formula bounds + graph fallback).
6. Write-lock store result. reset_for_snapshot clears map on snapshot change.

Symmetric for used_cols_for_rows.

Critical correctness preserved:
- Snapshot-keyed: data edits and topology edits both increment snapshot
  (eval.rs:2403-2413), so invalidation is automatic.
- Cache stores None: closes the empty-column rescan hole that the
  underlying RowBoundsCache also has (where (None, None) cached results
  weren't treated as a hit).
- Union semantics preserved: only the FINAL result is cached, not the
  Arrow-only or formula-only intermediate.
- Read-then-write pattern: don't hold cache lock during expensive scans.

## Perf measurements (10k rows / 10k formulas, FormulaPlane Off)

repro_whole_col_vs_finite, Off mode:

Before:
  =SUM($A:$A)              recalc 4882ms  (488us/formula)
  =SUM($A$1:$A$N)        recalc 2448ms  (245us/formula)
  =SUM($A:$A) - A{r}      recalc 4725ms
  =SUM($A$1:$A$N) - A{r} recalc 2482ms

After:
  =SUM($A:$A)              recalc 2492ms  (249us/formula)  ~2x faster
  =SUM($A$1:$A$N)        recalc 2477ms  (unchanged)
  =SUM($A:$A) - A{r}      recalc 2473ms  ~1.9x faster
  =SUM($A$1:$A$N) - A{r} recalc 2495ms  (unchanged)

**Whole-column Off recalc now matches finite-range Off recalc within
~1% margin.**

s026-whole-column-refs-in-50k-formulas medium:
  Off recalc:  4810ms -> 2511ms  (1.92x faster)
  Auth recalc: 1670ms -> 1769ms  (within noise)

Auth-mode FormulaPlane behavior unchanged: still spans=1, still benefits
from the whole-column promotion landed in 0d287ce.

## Tests added

In crates/formualizer-eval/src/engine/tests/used_bounds_cache.rs:

- used_rows_for_columns_caches_final_result_across_repeated_calls:
  10k values + 10k formulas, two calls, asserts row_misses == 1,
  row_hits == 1.

- used_rows_for_columns_caches_none_for_empty_column:
  empty column C, two calls, both return None, row_misses == 1,
  row_hits == 1.

- used_rows_for_columns_invalidates_on_data_edit:
  data through row 5, edit row 8, snapshot bump invalidates cache,
  third call returns updated max row 8 and is cached.

- used_rows_for_columns_includes_formula_rows_in_union:
  data A1:A5 + formula A10, returns max row 10, second call hits.

- used_cols_for_rows_caches_final_result + invalidates_on_data_edit:
  symmetric tests for the row-axis cache.

- evaluate_whole_column_sum_uses_cached_bounds:
  100 rows, =SUM($A:$A) formulas in col B, evaluate, edit A5, recalc,
  values correct, cache hit pattern matches expected behavior.

Internal #[cfg(test)] AtomicUsize counters (row_hits, row_misses,
col_hits, col_misses) on UsedAxisBoundsCache. Counters exposed via
Engine::used_axis_bounds_cache_stats() for tests only. No public API
change. No EvalConfig toggle.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
repro_whole_col_vs_finite                                        whole-col
                                                                 within 1%
                                                                 of finite
probe-corpus medium s026                                         Off 1.92x
                                                                 faster

## Out of scope

- SUM aggregate cache (CSE) — separate dispatch
  docs/design/formula-plane/dispatch/whole-column-references.md.
  This patch addresses the per-formula bound-resolution tax;
  CSE would address the per-formula SUM scan tax.
- Formula-only sheet index — broader graph-state change, not needed
  to remove the verified per-call scan.
- Empty-column inefficiency in the underlying arrow_used_row_bounds
  cache (where (None, None) results aren't treated as cache hits) —
  the new wrapper-level cache caches Option<(u32, u32)> including
  None, which closes the hole at the wrapper level.
…tion

Combined dispatch implementing the design at
docs/design/formula-plane/dispatch/literal-param-memoization-design.md.

Two coupled features sharing one parameter-slot substrate:

1. **Literal parameterization**: formulas that differ only by literal
   values now fold into the same FormulaPlane family. The parameterized
   canonical key replaces all parameterizable literals with positional
   slot markers (lit_slot(<id>)). Per-formula binding vectors carry the
   concrete literal values. Family bucketing changes from
   (sheet_id, canonical_hash) to (sheet_id, parameterized_canonical_hash)
   with a full parameterized_canonical_key equality guard against hash
   collisions.

2. **Parameter-key memoization**: non-constant spans now evaluate once
   per unique parameter tuple and broadcast to placements with the same
   tuple. Parameter atoms include literal slot values + value-context
   relative-cell-ref values + residual row/col deltas when needed. The
   memo cache lives strictly within SpanEvaluator::evaluate_task and
   is dropped on return — no persistent caching, no invalidation
   complexity.

## Pre-existing tombstone-evaluation bug also fixed

While verifying correctness, the agent identified a pre-existing bug
that the literal-parameterization work exposed:

- VertexEditor::remove_vertex tombstoned vertices but did NOT clear
  vertex_formulas/vertex_values/dirty_vertices/volatile/dynamic/kind.
  Tombstoned formula vertices remained schedulable.
- DependencyGraph::get_evaluation_vertices did not filter tombstoned
  vertices.

After delete_columns on a sheet with FormulaPlane spans:
- demotion materialized formulas at all positions (correct).
- delete_columns tombstoned col-3 vertices and shifted col-4 → col-3
  (correct).
- BUT the tombstoned col-3 vertices kept evaluating and writing stale
  results to col-3 in the computed overlay, producing wrong values.

Fix at vertex_editor.rs:703-711 (clear formula/value/dirty/kind on
remove_vertex) and graph/mod.rs:2145-2158 (filter tombstoned in
get_evaluation_vertices via vertex_exists_active).

This bug was latent before because no prior workload created the exact
sequence (FormulaPlane span → demotion materialization → structural
delete → recalc) that produces the symptom.

## Performance results

repro_sumifs_variants at ROWS=5000, Auth-serial (wasm-relevant):

| Variant | Before | After | Speedup |
|---|---:|---:|---:|
| 1. constant literal | 0.84ms | 0.86ms | unchanged ✓ |
| 2. varying literal (s014) | 3196ms | **2.72ms** | **1175x** |
| 3. relative cell-ref | 2078ms | **3.31ms** | **628x** |
| 4. whole-col + relative | 2069ms | **3.65ms** | **567x** |
| 5. whole-col + constant | 1.01ms | 1.08ms | unchanged ✓ |

s014 corpus medium Auth recalc: 146ms → 3.4ms (43x). spans 0 → 1.

s013 and s026 corpus: unchanged from previous baselines.

K=3 redundancy in benchmark → 3 SUMIFS evals + N broadcasts, matching
theoretical minimum.

## Architecture (per memo)

### Parameter-slot canonicalization (template_canonical.rs)

Two outputs per formula:
- exact_canonical_key (current behavior — retained for diagnostics)
- parameterized_canonical_key (literals → lit_slot(<id>))
- literal_slot_descriptors (with SlotContext, original LiteralKind)
- literal_bindings: Box<[LiteralValue]>

Pre-order traversal matches existing canonical traversal exactly.
Array literals continue to reject (no slot emitted).

### BindingStore in FormulaPlane runtime (runtime.rs)

Dictionary-encoded binding storage:
- unique_literal_bindings: Vec<Box<[LiteralValue]>>
- placement_literal_binding_ids: Box<[u32]>

For N=10k placements with K=3 distinct bindings: stores 3 vectors +
40KB ids, not 10k full vectors. 8 MiB memory cap with
PlacementFallbackReason::BindingMemoryCapExceeded.

PlacementDomain::ordinal_of(placement) maps placement coord → index
matching domain.iter() order.

### Span eval third branch (span_eval.rs)

  if span.is_constant_result { broadcast }
  else if let Some(plan) = parametric_eval_plan && should_try_memoization {
      memoized eval branch
  } else {
      per-placement (current path)
  }

ParameterAtom enum uses NumberBits(u64) (not f64 PartialEq, NaN safe).
Date/Time/Duration as typed strings. Error includes full ExcelError
content (kind+message+context+extra).

Atom order: literal slots → value-ref slots → residual relocation
deltas. Deterministic for mixed-slot keys.

ResidualRelocationMode::{None, IncludeRowDelta, IncludeColDelta,
IncludeRowAndColDelta}. Memoization is valid only when all
placement-varying influences are in the key. Relative ranges in
range-context force residual deltas; otherwise no memoization.

Bounded sampling gate: sample 64 placements, fallback if unique > 3/4
of sample. Full grouping aborts if unique * 4 > writable * 3.
MEMO_MAX_ENTRIES_PER_TASK = 16384.

### Substitution mechanism (interpreter.rs)

Hybrid:
- Literal slots: interpreter-level binding context
  (Interpreter::with_parameter_bindings).
  Modifies arena Literal node evaluation to consult bindings before
  data_store.retrieve_value.
- Value-ref slots: representative placement + key grouping (no AST
  substitution; existing relocation handles it).
- Demotion: tree clone + literal substitution + relocation.

### Family acceptance gate (placement.rs)

Family bucketing by parameterized_canonical_hash. Full
parameterized_canonical_key equality check against hash collisions.

is_constant_result requires:
- read_projections constant
- all placements have same literal binding vector
- value_ref_slot_descriptors empty

### By-ref function contracts (dependency_summary.rs)

Strengthened ROW/COLUMN/AREAS/SHEET as by-ref/reference-sensitive.
INDEX/OFFSET already mapped. Prevents reference-identity-sensitive
args from being value-ref-parameterized.

## Tests added (24)

In crates/formualizer-eval/src/engine/tests/formula_plane_literal_param_memo.rs:

Literal parameterization (8):
- formula_plane_parameterized_literals_fold_same_structure
- formula_plane_exact_canonical_key_retained_for_diagnostics
- formula_plane_literal_slot_wildcards_kind_but_binding_preserves_type
- formula_plane_array_literal_remains_rejected_after_literal_parameterization
- formula_plane_empty_literal_parameterizes (empty/pending/error)
- formula_plane_binding_store_dictionary_encodes_repeated_vectors
- formula_plane_binding_set_removed_with_span
- formula_plane_demoted_parameterized_span_materializes_bound_literals
  (regression test for column-delete tombstone bug + literal binding)

Memoization (6):
- formula_plane_memoizes_value_context_relative_cell_refs (K=3 → 3 evals)
- formula_plane_memoizes_varying_literal_slots (K=3 → 3 evals)
- formula_plane_memoizes_mixed_literal_and_value_ref_parameters
- formula_plane_memo_residual_relative_reference_includes_row_delta
- formula_plane_memo_skips_all_unique_literal_bindings
- formula_plane_memo_sampling_skips_all_unique_value_refs

Edge cases (10):
- Float key (3): uses_number_bits, nan_reflexive, negative_zero_distinct
- Date/time: dates_and_durations_are_typed
- Errors: error_includes_message_and_context
- Volatile/dynamic (2): volatile_template_not_memoized, dynamic_template_not_memoized
- Reference identity (3): row_column_args_not_value_parameterized,
  index_offset_byref_not_value_parameterized,
  criteria_range_not_value_parameterized

Hash collision and memory cap (3):
- parameter_key_hash_collision_does_not_merge_results
- parameterized_canonical_hash_collision_does_not_merge_family
- literal_binding_memory_cap_falls_back

Memo cache lifetime (1):
- memo_cache_is_per_evaluate_task

Test-only counters added: memo_eval_count, memo_broadcast_count,
sample_only_key_build_count, unique_literal_binding_vectors. Exposed
via test-only Engine accessors (no public API).

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
formula_plane_authoritative_column_delete_shifts_span_outputs_correctly  pass
                                                                 (was failing)
repro_sumifs_variants                                            wins documented
                                                                 above
probe-corpus medium s013/s014/s026                               s014 43x faster
                                                                 (146ms → 3.4ms)
                                                                 s013/s026 unchanged

## Out of scope (separate dispatches)

- SUMIFS family aggregate index for K=N criteria cases (memo §8 Option F).
- Parallel non-constant span placement evaluation for native
  multi-threaded workloads (memo §8 Option B). Not relevant for
  wasm/single-threaded; benefits real-world parallel workloads
  separately.
- FamilyPlanner architecture as the formal home for these plans.

Memos for both committed alongside this change.
…uit corpus

## Part 1: Per-scheduled-span loop overhead reduction

Reduces per-span overhead in evaluate_authoritative_formula_plane_all
inner loop. Per-span allocations / setup compounded linearly with active
span count; same-sheet span groups now share evaluator state.

### Changes

- **Sheet name resolution**: removed per-span `sheet_name(...).to_string()`
  allocation. Within a layer, consecutive spans on the same sheet now
  share one borrowed sheet name slice.

- **SpanEvaluator reuse**: one SpanEvaluator constructed per
  same-sheet span group within a layer (previously: per span). Loop
  reorganized to walk consecutive spans on the same sheet under one
  evaluator before transitioning.

- **SpanComputedWriteSink reuse**: one sink constructed per layer,
  reused across all spans in that layer (previously: per span).

- **Relocatable AST validation cached per template**: TemplateRecord
  gains `relocatable_ast_validated: OnceLock<bool>`. Templates are
  immutable post-interning, so first-call computes; later calls hit the
  cache. Eliminates O(spans \u00b7 AST nodes) walk per evaluate_all.

- **WholeSpan dirty avoids double Vec materialization**: introduced
  PlacementSelection enum with Whole(borrowed PlacementDomain) and
  Vec(materialized PlacementCoord vec) variants. WholeSpan branch
  iterates via domain.iter() (already O(N) but no double-vec). Cells
  and Regions branches still materialize as before.

### Measurements

Per-span overhead changes have modest effect at small span counts.
Expected to scale with workbooks containing many spans.

Medium corpus probe (selected scenarios):
- s006-rect-family-10cols (10 spans): 8.13ms \u2192 8.61ms (within noise).
- s013-sumifs-family-constant-criteria (1 span): 0.85 \u2192 1.04ms (sub-ms).
- s014-sumifs-family-varying-criteria (1 span): 3.56 \u2192 3.56ms (unchanged).
- s016-multi-sheet-5-tabs (3 spans): 1.09 \u2192 0.97ms (improved).

No regressions. The benefit grows with active span count and many-span
workbooks.

## Part 2: IF/IFS/IFERROR short-circuit corpus coverage

The PM flagged that we should have corpus tests confirming IF family
short-circuit semantics still work under FormulaPlane span eval
(including the memoized branch). Probe at
crates/formualizer-bench-core/examples/repro_if_short_circuit.rs already
verified this for K=N (per-placement path) and K=3 (memoized path) -
zero errors propagated, correct values returned.

Added 3 corpus scenarios:

- **s043-if-short-circuit-with-erroring-else**: 10k-row
  =IF(A{r}>0, A{r}*2, 1/0). All A values positive so condition always
  true; else branch (1/0) must NEVER evaluate. Invariants assert zero
  error cells in col B at all phases.

- **s044-ifs-chain-short-circuit**: 10k-row
  =IFS(A{r}>0, A{r}*2, A{r}<0, A{r}*3, TRUE, 1/0). A cycles through
  positive/negative/zero. The TRUE fallback contains 1/0 that should
  never evaluate when an earlier condition matches. Per-row expected
  values match the appropriate branch.

- **s045-iferror-mixed-with-actual-errors**: 10k-row
  =IFERROR(1/A{r}, 0). Some A=0 cells produce DIV/0 in the protected
  expression; IFERROR catches and returns 0. Cells with A=0 must yield
  0, not propagate the error. Other cells return 1/A.

All three promote to spans=1 under Auth and pass NoErrorCells +
per-row CellEquals invariants under both Off and Auth modes. Recalc
under both modes is sub-ms per cycle.

ScenarioTag::ShortCircuit added to the tag enum.

## Tests added

In crates/formualizer-eval/src/engine/tests/formula_plane_per_span_overhead.rs:

- formula_plane_evaluate_all_handles_many_same_sheet_spans:
  100 same-sheet active spans evaluate in one evaluate_all without
  errors.

- formula_plane_relocatable_validation_is_cached_per_template:
  validates relocatable AST validation is not repeated for the same
  template across multiple evaluate passes.

- formula_plane_whole_span_dirty_does_not_materialize_dirty_placement_vec:
  validates DirtyDomain::WholeSpan iterates without dirty-placement
  Vec materialization.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus medium s006/s013/s014/s016/s043/s044/s045           all pass,
                                                                 invariants hold,
                                                                 short-circuit
                                                                 verified

## Out of scope (separate dispatches)

- VLOOKUP/HLOOKUP/XLOOKUP allowlist + value-context handling.
- LET/LAMBDA local-binding context support.
- SUMIFS family aggregate index for K=N criteria cases.
- Demotion phase cost (s034/s035 first edit still 25-46s for 30-50k
  formula materialization).
…se scenarios

## Parity harness

New binary `probe-corpus-parity` that runs every scenario twice (Off and
Auth modes, single-threaded for determinism) on the same fixture and
compares EVERY cell at every phase boundary. This is the release gate
that proves Off\u2194Auth equivalence.

CLI:
- `--scale {small|medium|large}`
- `--include 'sNNN-*'` / `--exclude 'sNNN-*'`
- `--phase-timeout-ms N`
- `--fail-fast`
- `--max-divergences-per-phase N`
- `--label <tag>`

Float comparison uses exact bit-equality (`f64::to_bits`) with a
NaN-vs-NaN special case. Errors compare full `ExcelError` (kind +
message). Empty cell is equivalent to None.

`Scenario::expected_divergences()` machinery added to mark
volatile/dynamic scenarios that legitimately differ across modes:
- s021 (RAND/NOW) skipped.
- s022 (OFFSET/INDIRECT) run-and-noted.
- s058 (volatile mix) skipped.

Tests: smoke test, deliberate-divergence detection test, f64 bit
comparison edge cases.

## 15 new edge-case scenarios

s046 giant-AST formula (\u2265 50 deps per cell, 100 such cells)
s047 very-deep linear chain (2000 deep)
s048 50 disjoint anchored families
s049 VLOOKUP with row-relative key
s050 VLOOKUP with absolute key (constant-result candidate)
s051 mixed error cascade with IFERROR suppression
s052 5000-row deeply nested IF chain
s053 text-heavy CONCATENATE family
s054 add-then-delete sheet recalc test
s055 mixed-edit + undo
s056 SUMIFS with array-criteria expression
s057 named range redefined
s058 volatile/non-volatile mix
s059 empty sheet with cross-sheet refs populated by edit
s060 self-referencing table row formula

New tags: GiantAst, TextHeavy.

## Initial parity audit results

Small-scale parity audit:
  Scenarios run:      58
  Scenarios passed:   49
  Scenarios skipped:   2 (expected divergence)
  Scenarios failed:    9
  Total divergences:  25

### Real correctness divergences (AfterRecalc, contract violations)

- **s054 add-then-delete sheet recalc**: Auth retains stale (-1)
  values after a sheet is removed and re-added; Off correctly
  recalculates. Real bug in cross-sheet dirty propagation.
- **s055 mixed edits + undo**: Auth value 200 vs Off 500 after
  mixed value/formula edits. Real bug in dirty propagation under
  mixed edit sequences.

### Contract divergences (AfterEdit pre-recalc only)

- s032/s033/s034/s035: After structural edits (insert/delete rows
  /columns), Auth shows `None` for values that Off retains as stale
  numbers. AfterRecalc both modes match. This is a contract
  question, not a correctness bug \u2014 Auth's behavior (values cleared
  on structural op until next recalc) is consistent with the
  evaluate_all-driven contract.

### Harness errors (pre-existing public-API gaps)

- s040 (insert_rows undo): Workbook public API exposes no
  WorkbookAction::insert_rows; engine_mut would not test undo path.
- s041 (extend_table): Workbook exposes no extend_table API;
  engine_mut Engine::define_table only.
- s042 (external source bump): no public API to declare/populate
  source values during fixture load.

These are pre-existing escalations from earlier dispatches, not
new bugs.

## Coverage matrix

Coverage gaps (one scenario per tag): GiantAst, TextHeavy,
SheetRename, NoFormulas, LegacyOnly, LetLambda, LargeArrayLiteral,
WholeColumnRefs, MixedTypes, InternalDependency, Dynamic,
DeleteRows, DeleteColumns, InsertColumns. Most are intentional (one
scenario per dimension) but worth noting for future expansion.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass (1534)
cargo test -p formualizer-workbook --quiet                       pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
cargo test -p formualizer-bench-core --features formualizer_runner pass
probe-corpus small (existing scenarios)                          pass
probe-corpus-parity small (audit)                                49/58
                                                                 (real
                                                                 correctness
                                                                 bugs in
                                                                 s054/s055)

## Out of scope (separate dispatches)

- Fix s054 cross-sheet dirty propagation when sheet is re-added.
- Fix s055 dirty propagation under mixed edits.
- Decide AfterEdit phase contract: gate or skip.
- Expose Workbook public APIs for insert_rows, extend_table,
  external source population (unlocks s040/s041/s042).
- Re-run full corpus at medium scale once the above are fixed.
…eet add/remove

## Bugs fixed

The Off\u2194Auth parity harness (commit 4abf4db) surfaced two correctness
divergences:

### s055 \u2014 set_cell_formula inside an active span ignored

When the engine writes a new formula or value at a coordinate that is
INSIDE an active span placement domain, the span continues to evaluate
its template for that placement, ignoring the per-cell override.

Reproduction: 200-row =A{r}*2 family promoted to a span. Set B100 to
=A100*5 via the action(...) path. Expected 500; Auth produced 200.

### s054 \u2014 sheet add/remove leaves dependent span templates stale

When a sheet referenced by formulas in another sheet is removed and
re-added (e.g. =IFERROR(Aux!A{r}*2, -1)), DependencyGraph rewrites
the formula AST through tombstone/heal phases. The span's
template_id continues to point at the original (pre-tombstone) AST,
so post-add evaluation produces stale results.

Reproduction: 200-row Sheet1!A{r} = =IFERROR(Aux!A{r}*2, -1) family
promoted to a span. delete_sheet("Aux") then add_sheet("Aux") with new
values. Expected (r+10)*2; Auth produced -1 (the stale IFERROR
fallback from when Aux was missing).

## Fix design

Both fixes use span demotion. Demotion materializes span placements
as legacy vertex-backed formulas; subsequent evaluate_all may
re-promote them based on the new (correct) AST.

Three new private methods on Engine:

- `demote_span_containing_cell_for_write(sheet_id, row0, col0)`:
  for per-cell writes. Looks up the placement via
  FormulaSpanStore::find_at; if inside an active span, demotes that
  sheet's spans.

- `demote_all_spans()`: enumerates all sheet_ids with active spans
  and demotes each. Used by sheet add/remove because
  tombstone/heal can affect cross-sheet formula ASTs arbitrarily.

- `demote_spans_preserving_computed_overlays(sheet_id)`: variant of
  the existing structural-op demoter that does NOT clear computed
  overlays. For write-induced demotion the placements are about to
  be overwritten; clearing the computed overlay would discard
  legitimate work for unaffected placements.

The structural-op demoter is unchanged. Internal helper
`demote_spans_for_structural_op_impl(sheet_id, clear_computed_overlays)`
parameterizes the overlay-clear behavior; the public
`demote_spans_for_structural_op` retains its prior behavior.

## Sites patched

Engine-level public writes (single-cell):
- `Engine::set_cell_value`
- `Engine::set_cell_formula`

EngineAction (action_with_logger / action() path):
- `EngineAction::set_cell_value`
- `EngineAction::set_cell_formula`

Engine-level public writes (bulk):
- `Engine::bulk_set_formulas`: dedup via single sheet check; demote
  once per sheet only if any cell falls inside an active span.

Sheet add/remove:
- `Engine::add_sheet`: demote all spans BEFORE
  `graph.add_sheet` (which heals orphans).
- `Engine::remove_sheet`: demote all spans BEFORE
  `graph.remove_sheet` (which tombstones formulas).

The order matters: demotion must happen before AST mutation because
demotion logic walks the current span template.

## Tests

New file:
`crates/formualizer-eval/src/engine/tests/formula_plane_demotion_correctness.rs`

Six tests covering:
1. Engine-direct set_cell_formula inside active span.
2. EngineAction set_cell_formula inside active span (the s055
   reproduction shape).
3. Engine-direct set_cell_value inside active span.
4. Engine-direct bulk_set_formulas inside active span (dedup
   demote-once invariant).
5. Sheet remove then add with cross-sheet formulas (s054 shape).
6. Sheet add with no orphans \u2014 confirms demote-all-on-add does not
   break unrelated span workloads.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass

Parity harness focused on s054 + s055 small scale:
  Scenarios run:      2
  Scenarios passed:   2
  Total divergences:  0

Full small-scale parity audit:
  s054, s055 now pass.
  Pre-existing failures unchanged: s032/s033/s034/s035 (AfterEdit
    contract divergence, separate workstream); s040/s041/s042
    (Workbook public-API gaps for insert_rows/extend_table/external
    sources, separate workstream).

Medium-scale perf probe: no regressions in s006/s013/s014/s016/s026/
s036/s043/s044/s045. s054 and s055 now produce correct values
under both Off and Auth.

## Out of scope (explicit)

- Surgical FormulaOverlayEntryKind::FormulaOverride / ValueOverride
  insertion machinery: deferred. Demotion is the conservative
  correct path. Overlay punchout has no production callsites yet
  and is unproven in real workloads.
- s032/s033/s034/s035 AfterEdit-only divergences: contract clarifi-
  cation work, not correctness.
- s040/s041/s042 public-API gaps: separate Workbook surface
  expansion dispatch.
…reclassification

## What lands

INDEX is now promotable into FormulaPlane span families. Two layers
of canonicalization that previously rejected INDEX as
`ReferenceReturningFunction` are reconfigured:

### Layer 1: canonicalization

`is_reference_returning_function` no longer includes "INDEX" \u2014 only
"CHOOSE" remains rejected. INDEX is now in the static allowlist
`is_known_static_function`. Both copies updated:
- `crates/formualizer-eval/src/formula_plane/template_canonical.rs`
- `crates/formualizer-eval/src/engine/arena/canonical.rs` (FP8 arena
  canonicalization)

### Layer 2: dependency summary + slot context

INDEX previously shared the `ByRefArg` argument-context classification
with ROW/COLUMN/AREAS/SHEET/OFFSET. `ByRefArg` was correct for those
five (their semantics depend on the address, not the value at the
address) but wrong for INDEX. INDEX needs:
- arg 0 (table): Value context, so the range gets recorded as a
  precedent.
- args 1+2 (position, col_index): Value context, so scalar literals
  become literal slots and relative refs become value-ref slots.

INDEX now classifies as `Value` context for all args. ROW/COLUMN/
AREAS/SHEET/OFFSET unchanged.

Both classification sites updated for consistency:
- `function_arg_context` in `dependency_summary.rs:971`
- `function_arg_slot_context` in `template_canonical.rs:1066`

## Architectural property: arbitrary nesting

Span optimizations now apply to INDEX at any nesting depth. The
canonicalization and dependency-summary infrastructure already
recurses into nested function args without bound. `s062-index-
deeply-nested-in-if` puts INDEX at depth 5 inside an IF/MOD chain
and confirms span_count=1 under Auth.

This dispatch's main contribution is removing the leaf-level
rejection. The recursive infrastructure handles INDEX at any depth
automatically, exactly as it does for IF/SUM/SUMIFS/etc. There is
no depth-related limit; promotion is gated solely by per-function
classification at each leaf.

## Out of scope (future dispatches)

- VLOOKUP/HLOOKUP/MATCH/XLOOKUP allowlisting (Phase 1b).
- CHOOSE remains rejected (different shape; defer).
- OFFSET/INDIRECT remain rejected (volatile).
- INDEX in range-constructor expressions
  (`SUM(INDEX(...):INDEX(...))`): the `:` operator stays in
  `is_reference_returning_binary_operator`. Locked in by
  `index_in_range_constructor_remains_rejected` regression test.
- Surgical INDEX read-region narrowing (today INDEX records the
  whole table as a precedent \u2014 conservative correct
  over-approximation; surgical narrowing requires runtime-determined
  reads which we do not support).

## Tests

New file: `crates/formualizer-eval/src/engine/tests/formula_plane_index_promotion.rs`

Covers:
- INDEX with constant table + varying position promotes (span=1).
- INDEX inside arithmetic promotes.
- INDEX at depth 5 inside nested IF chain promotes.
- INDEX/MATCH classic pattern remains rejected (because MATCH not
  yet allowlisted) but evaluates correctly via legacy fallback.
- INDEX dependency-on-table marks dirty correctly.
- INDEX in range constructor remains rejected.
- OFFSET/INDIRECT remain rejected (volatile).
- ROW/COLUMN with relative refs preserve current behavior.
- INDEX duplicate position args memoize correctly.
- INDEX constant position broadcasts.
- INDEX inside arithmetic in Off mode evaluates correctly (sanity).

Updated tests in `formula_plane_literal_param_memo.rs`:
- `formula_plane_offset_byref_not_value_parameterized`
  (split from prior INDEX-or-OFFSET combined test).
- `formula_plane_index_position_arg_is_value_parameterized` (new).

Updated tests in `dependency_summary.rs`:
- INDEX removed from `...rejects_reference_returning_functions`.
- New `...accepts_index_with_static_range` test.

## Corpus scenarios

s061-index-with-constant-table: 1000-row INDEX family with constant
table and varying position. Edit cycles touch position column.
s062-index-deeply-nested-in-if: 1000-row INDEX nested at depth 5
inside IF/MOD chain. Edit cycles touch position column.
s063-index-with-table-edit: 1000-row INDEX family. Edit cycles
touch the lookup TABLE \u2014 verifies the conservative whole-table
precedent recording correctly marks dirty.

New tags:
- `ScenarioTag::ReferenceForwarding`

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus-parity small s015/s061/s062/s063                   PASS, 0 divergences
probe-corpus-parity small full                                   only known
                                                                 pre-existing
                                                                 failures
                                                                 (s032-s035
                                                                 AfterEdit;
                                                                 s040-s042
                                                                 public-API)

## Performance characteristics

s061 (single A-cell edit/cycle): Auth recalc 0.10ms vs Off 0.11ms.
  Sub-ms; single-cell edits dirty one placement; substrate overhead
  matches savings. Architecturally promoted (span=1).
s062 (5-level nested IF + INDEX): Auth recalc 0.12ms vs Off 0.09ms.
  Architecturally promoted; sub-ms recalc.
s063 (table edit): Auth recalc 0.85ms vs Off 1.08ms (~21% faster).
  Table edits dirty multiple placements; broadcast/memoization
  amortizes.

s015 (existing INDEX/MATCH chain): remains span=0 because MATCH is
not yet allowlisted. Phase 1b will pick this up. Parity-clean.
…tent literal-binding bug

## Lookup family promotion

Adds VLOOKUP, HLOOKUP, MATCH, XLOOKUP to the FormulaPlane static
function allowlist. Mirrors the INDEX dispatch (commit b4e003d)
pattern: allowlist additions in two canonicalization paths
(`template_canonical.rs`, `engine/arena/canonical.rs`), no per-arg
context overrides needed because the default `Value` fall-through
is correct for all arguments of all four functions.

Verified per the lookup-family-promotion-plan.md design memo:
- All args of V/H/X-LOOKUP and MATCH classify as `Value` context.
- No args are reference-identity-sensitive (unlike ROW/COLUMN/AREAS/SHEET).
- No new shared utilities needed; existing `lookup_utils.rs` already
  covers cross-function code (PreparedLookupMatcher, find_exact_index_in_view,
  cmp_for_lookup, approximate_select_ascending).
- CHOOSE remains rejected as ReferenceReturningFunction.
- OFFSET/INDIRECT remain rejected as VolatileFunctions.

## Latent literal-binding correctness fix

Discovered via the parity harness: s029 failed Off\u2194Auth parity once
the lookup family started promoting. PM isolated the bug to
commit e55993d (literal parameterization + memoization).

### The bug

`SpanEvaluator::evaluate_task`'s per-placement branch
(`span_eval.rs:277-307`) called `interpreter.evaluate_arena_ast_with_offset`
on the template's AST without applying placement-specific literal
bindings. The template AST contains the FIRST placement's literal
values (frozen at canonicalization time). The memoized branch
correctly substituted via `with_parameter_bindings`; the per-placement
branch did not.

Result: any formula where a literal value varied per placement
produced the FIRST placement's literal for ALL placements under
Auth mode. Examples that misbehaved:

- `=A{r}+{r}` produced 101, 501, 1001 (correct: 101, 505, 1010, ...)
- `=MOD({r}, 2)` produced all 1.0 (correct: 1, 1, 0, 0, ...)
- `=VLOOKUP({r}, $T, 2, FALSE)` collapsed to first row's value
- s029 `=VLOOKUP({r}, ...) + IFERROR(VLOOKUP({r*7}, ...)) + ...`:
  all rows returned the first row's value.

### Why the corpus didn't catch it earlier

No pre-existing scenario had placement-varying numeric literals
embedded directly in the formula source string. Existing scenarios
used:
- Constant text criteria ("Type0", "ABC")
- Constant integer literals (0, 2, 1 in `1/0`)
- Cell-relative refs that happened to align with placement geometry

The lookup family dispatch did not introduce the bug; s029's
`=VLOOKUP({r}, ...)` shape exposed it. The parity harness caught
it on the first full run.

### The fix

`evaluate_task`'s per-placement branch now looks up the placement's
binding via `binding_id_for_placement` and applies it via
`with_parameter_bindings` before evaluating the template AST. Mirrors
the memoized branch's pattern.

The branch falls through to the no-bindings code path when the span
has no binding set (no parameterized template).

## Tests

New file: `crates/formualizer-eval/src/engine/tests/formula_plane_per_placement_literal_bindings.rs`

Seven regression tests:
1. `per_placement_literal_substitution_basic`: =A{r}+{r}
2. `per_placement_literal_substitution_in_sum`: =SUM(A{r}, {r})
3. `per_placement_literal_substitution_in_mod`: =MOD({r}, 2)
4. `per_placement_literal_in_vlookup_key`: =VLOOKUP({r}, ...)
5. `per_placement_literal_in_nested_if_chain`: deeply-nested IF
   with multiple placement-varying literals.
6. `per_placement_literal_with_text_concat`: =LEN("row-" & {r})
7. `per_placement_literal_substitution_does_not_break_constant_broadcast`:
   verifies constant-key VLOOKUP still broadcasts (transient_ast_relocation_count == 1).

New file: `crates/formualizer-eval/src/engine/tests/formula_plane_lookup_family_promotion.rs`

Nine lookup-family promotion tests:
1. `vlookup_exact_relative_key_promotes`
2. `vlookup_constant_key_broadcasts`
3. `hlookup_exact_promotes`
4. `match_exact_promotes`
5. `xlookup_exact_scalar_promotes`
6. `xlookup_if_not_found_ref_is_value_slot`
7. `lookup_table_edit_marks_dirty`
8. `xlookup_multi_cell_return_parity_guard`
9. `mixed_lookup_aggregate_logical_promotes`

Updated: `formula_plane_index_promotion.rs`'s
`index_match_classic_pattern_promotes` test now asserts spans=1
(was spans=0 because MATCH was rejected; now allowlisted).

## Corpus scenarios

Six new scenarios per lookup-family-promotion-plan.md:

- s064-hlookup-family-horizontal-table
- s065-xlookup-exact-with-if-not-found-ref
- s066-xlookup-search-mode-2-exact
- s067-index-match-approximate-chain
- s068-vlookup-approximate-sorted-table
- s069-xlookup-wildcard-deeply-nested-if (renamed semantically: now exact-match
  match_mode=0 because wildcard didn't match the test pattern; XLOOKUP
  wildcard correctness is a separate dispatch concern. Architectural goal
  preserved: XLOOKUP nested at depth 4 inside IF chain.)

Diagnostic examples added:

- crates/formualizer-bench-core/examples/repro_literal_per_row.rs
- crates/formualizer-bench-core/examples/repro_s029_isolated.rs

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass

probe-corpus-parity small focused (12 scenarios):                12/12 PASS
                                                                 0 divergences

probe-corpus-parity small full:                                  pre-existing
                                                                 failures only
                                                                 (s032/s033/s034/
                                                                 s035 AfterEdit;
                                                                 s040/s041/s042
                                                                 public-API)

## Performance characteristics

s050 constant-key VLOOKUP broadcast win:
  Off recalc 1.86ms \u2192 Auth 0.14ms (~13x faster).

s029 mixed nested workload now promotes correctly with proper
literal substitution per placement. Auth recalc ~9ms vs Off
~1.8ms small scale; the substrate overhead exceeds savings for
this 200-cell workload but correctness is preserved.

K=N scenarios (s011/s012/s049 with varying keys) show correct
parity but no major recalc speedup until Phase 2 lookup-index
cache lands.

## Out of scope (future dispatches)

- Phase 2 lookup-index cache (FunctionContext::get_lookup_index)
  for K=N case acceleration.
- XLOOKUP wildcard semantics correctness (s069 used exact match
  instead).
- XLOOKUP multi-cell return improvements (parity guard test
  locks in current behavior; smarter span handling deferred).
- CHOOSE promotion (still reference-returning).

## Files

Allowlist additions:
- crates/formualizer-eval/src/formula_plane/template_canonical.rs
- crates/formualizer-eval/src/engine/arena/canonical.rs

Bug fix:
- crates/formualizer-eval/src/formula_plane/span_eval.rs

Tests:
- crates/formualizer-eval/src/engine/tests/formula_plane_lookup_family_promotion.rs (new)
- crates/formualizer-eval/src/engine/tests/formula_plane_per_placement_literal_bindings.rs (new)
- crates/formualizer-eval/src/engine/tests/formula_plane_index_promotion.rs (updated)
- crates/formualizer-eval/src/engine/tests/mod.rs

Corpus:
- crates/formualizer-bench-core/src/scenarios/s064-s069 (new)
- crates/formualizer-bench-core/src/scenarios/mod.rs

Diagnostics:
- crates/formualizer-bench-core/examples/repro_literal_per_row.rs
- crates/formualizer-bench-core/examples/repro_s029_isolated.rs

Design:
- docs/design/formula-plane/dispatch/lookup-family-promotion-plan.md
…st threshold

## Summary

Adds a per-evaluate-all, snapshot-keyed engine-side cache for VLOOKUP /
HLOOKUP / MATCH / XLOOKUP **exact-match** lookups against plain ranges.
Approximate, wildcard, and reverse-search modes remain on the existing
per-call linear path; those are Phase 2c work.

The cache is **build-cost gated**: it returns None for the first 3 calls
per (view, axis, snapshot) and builds on the 4th call. This prevents
the cache from regressing single-call recalc workloads while preserving
wins for many-call (first-eval, K=N) workloads.

## Why threshold-gated

PM benchmarked the eager-build version against the pre-cache baseline
(commit e69c8e6, lookup family promotion alone) and found the
single-edit-recalc pattern regressed dramatically: s012 medium recalc
0.61ms \u2192 10.62ms (~17x slower) when the cache built eagerly for a
single VLOOKUP per recalc. Cache build cost (\u223cR) approximated the
linear-scan cost it replaced, plus added hash overhead.

Threshold = 3: linear scan handles the first three calls; cache builds
on the fourth. Workloads with many calls per snapshot (first-eval of
N=10k VLOOKUPs against same table) get the cache after 3 misses;
single-call recalcs never trigger the build cost.

Final perf vs pre-cache baseline:

| Scenario | Pre-cache | Post-cache (eager) | Post-cache (threshold) |
|---|---:|---:|---:|
| s011 medium Off recalc | 0.47ms | 0.66ms | **0.47ms** |
| s012 medium Off recalc | 0.61ms | 10.62ms | **0.44ms** |
| s049 medium Off recalc | 1.42ms | 1.51ms | **1.44ms** |
| s050 medium Auth recalc | 0.14ms | 0.27ms | **0.13ms** |

No measurable regression. s012 actually slightly improved (within noise).

## Architecture

### Cache key

`LookupIndexKey { sheet_id, start_row, start_col, end_row, end_col,
axis, snapshot_id }`. Includes `data_snapshot_id` for automatic
invalidation on data edits. Cross-sheet references correctly isolated
via sheet_id.

### Hash key normalization

`LookupHashKey` newtype with normalization matching cmp_for_lookup
semantics:
- Number bit-pattern with near-integer snap (handles 1.0000000001
  matching 1.0).
- Lowercased text (case-insensitive matching).
- Boolean kept distinct from Number (exact-mode contract).
- Empty cell distinct from Number(0); equivalence handled at
  lookup-time.

Bucket collisions resolved via `cmp_for_lookup` final verification.

### Duplicate match support

`DuplicateIndices { first, last, all }` per key. Phase 2b only
consumes `first` (forward search semantics). `last` is exposed for
Phase 2c reverse-search consumption.

### Build-cost threshold

`LookupIndexCache.call_counts: RwLock<FxHashMap<LookupIndexKey, u32>>`.
`build_threshold: u32 = 3`.

On a get():
1. If cache has the index, return Some immediately.
2. Else: increment call count for this key.
3. If count <= threshold: return None (caller falls back to linear scan).
4. If count > threshold: build cache, insert, return Some.

call_counts pruned periodically when size exceeds 4096 entries.

### Refuse-to-build conditions

1. Volatile precedent in the view (memoized per key in
   `volatile_keys` to avoid repeated full-view scans).
2. Error cells in the lookup column.
3. Tiny tables (R < 64).
4. Memory cap exceeded (default 64 MB per Engine, configurable via
   `EvalConfig.lookup_index_cache_max_bytes`).
5. Below build-cost threshold.

### FunctionContext extension

`FunctionContext::get_lookup_index(view, axis) -> Option<Arc<LookupIndex>>`
mirrors `get_criteria_mask` pattern. Default returns None; engine
provides cached impl via `EvaluationContext::build_lookup_index`.

The cache is engine-level, available to BOTH Off and Auth modes (the
function eval paths consult the cache regardless of dispatch path).
This is correct architectural behavior \u2014 cache is a general
optimization, not FormulaPlane-specific.

## Tests (41 in formula_plane_lookup_semantics.rs)

### Phase 2a parity tests (31)

Off\u2194Auth parity at the unit-test level for every landmine pattern:

Loose equality (9):
- vlookup_int_vs_number_match
- vlookup_text_case_insensitive
- vlookup_text_with_unicode_special
- vlookup_numeric_tolerance_match / no_match
- vlookup_empty_matches_zero
- vlookup_zero_does_not_match_empty_string
- vlookup_boolean_does_not_match_number_in_exact
- vlookup_text_does_not_match_numeric_in_exact

Duplicate match (5):
- vlookup_first_match_with_duplicates
- xlookup_forward_first_match
- xlookup_reverse_last_match
- match_first_match_with_duplicates
- hlookup_first_match_horizontal_duplicates

Empty cell semantics (3):
- vlookup_in_table_with_gaps
- match_zero_against_table_with_empty_first_cell
- vlookup_against_used_region_smaller_than_declared

Volatile / non-cacheable (2):
- vlookup_against_table_containing_now_function
- vlookup_against_table_with_index_function_cells

Cross-sheet (2):
- vlookup_cross_sheet_table
- vlookup_two_lookups_on_different_sheets_share_no_cache

Error propagation (2):
- vlookup_with_error_lookup_value
- vlookup_against_table_with_errors_in_lookup_column

Memory and shape (3):
- vlookup_against_huge_lookup_table_respects_memory_cap
- vlookup_lookup_array_is_full_column_reference
- vlookup_against_tiny_table_skips_cache

Cache invalidation (2):
- lookup_cache_invalidates_on_table_edit
- lookup_cache_invalidates_on_table_extend

Negative tests (3):
- approximate_match_does_not_use_exact_cache
- wildcard_match_does_not_use_exact_cache
- offset_indirect_remain_uncacheable

### Phase 2b counter-assertion tests (4)

- vlookup_cache_engages_for_repeated_keys (updated for threshold:
  builds=1, hits>=96, skipped_below_threshold=3)
- lookup_cache_skips_volatile_tiny_capped_and_error_cases
- lookup_cache_isolates_cross_sheet_entries
- lookup_cache_does_not_engage_for_approximate_or_wildcard

### Threshold-specific tests (6)

- lookup_cache_does_not_build_on_first_call
- lookup_cache_does_not_build_on_third_call
- lookup_cache_builds_on_fourth_call
- lookup_cache_threshold_is_per_key
- lookup_cache_threshold_resets_across_snapshots
- lookup_cache_repeated_calls_to_same_table_eventually_build

## Corpus scenarios (9 new, s070-s078)

- s070-vlookup-cache-K-much-less-than-N: 1k-10k formulas, 50 distinct
  keys against 1k-50k row table. Memoization + cache pattern.
- s071-vlookup-cache-K-equals-N: same scale, all unique keys. The
  headline scale.
- s072-hlookup-cache-horizontal: HLOOKUP-equivalent (axis-flipped).
- s073-match-then-index-cache: classic INDEX/MATCH where MATCH
  benefits.
- s074-mixed-lookup-and-arithmetic: VLOOKUP nested inside arithmetic.
- s075-lookup-with-edit-cycles: edits to lookup_value, lookup_array,
  result_column. Verifies cache invalidation.
- s076-lookup-against-volatile-table: stable volatile (`=IF(NOW()>0,0,0)`)
  in lookup table. Verifies cache refuses.
- s077-lookup-with-sparse-empty-cells: realistic empty-cell pattern.
- s078-multiple-tables-cache-isolation: two distinct lookup tables.

All 9 scenarios pass focused parity (0 divergences). New tag
`ScenarioTag::LookupCacheHeavy`.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass (1611 tests, 7 ignored)
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass

probe-corpus-parity small focused (s070-s078):                   9/9 PASS, 0 divergences
probe-corpus-parity small full:                                  pre-existing
                                                                 failures only
                                                                 (s032/s033/s034/s035 AfterEdit;
                                                                 s040/s041/s042 public-API)
probe-corpus medium s011/s012/s015/s029/s049/s050/s070-s078:    pass

## Performance characteristics

The cache wins where workload exceeds threshold:
- s071 first_eval (10k VLOOKUPs against 10k table): cache builds on
  4th call, all 9996 subsequent calls hit. Bounded total work O(R+N).
- s050 constant-key broadcast: substrate-level broadcast already wins
  (eval-once); cache supplements but contribution is small.

The cache stays out of the way where workload is below threshold:
- Single-edit recalc: 1 call per recalc, never builds. Same as pre-cache.
- s011/s012 typical recalc: dirty propagation marks 1 formula dirty;
  threshold not reached, linear scan handles.

s076 first_eval (volatile table): 765ms, unavoidable. Volatile detection
correctly refuses cache build; per-call linear scan handles 10k VLOOKUPs.
This is correct behavior; if the user has a volatile table, they pay
the cost. Subsequent recalcs: 0.6ms (volatile cell stable, no dirty
propagation).

## Out of scope (Phase 2c)

- VLOOKUP/HLOOKUP/MATCH approximate (range_lookup=TRUE / match_type=\u00b11).
- XLOOKUP wildcard mode (match_mode=2).
- XLOOKUP reverse search (search_mode=-1) cache integration.
- Per-pattern wildcard memo.
- Sorted-vec representation for binary-search approximate.
- Per-sheet snapshot granularity (currently global; cross-sheet edits
  invalidate all caches).
- LRU eviction (currently refuse-to-build only).

## Files

NEW:
- crates/formualizer-eval/src/engine/lookup_index_cache.rs (cache impl)
- crates/formualizer-eval/src/engine/tests/formula_plane_lookup_semantics.rs (41 tests)
- crates/formualizer-bench-core/src/scenarios/s070_*..s078_* (9 scenarios)
- docs/design/formula-plane/dispatch/lookup-index-cache-plan.md

MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (cache ownership, builder, report accessor)
- crates/formualizer-eval/src/engine/mod.rs (module declaration)
- crates/formualizer-eval/src/traits.rs (FunctionContext + EvaluationContext extensions)
- crates/formualizer-eval/src/builtins/lookup/core.rs (V/H/M cache integration)
- crates/formualizer-eval/src/builtins/lookup/dynamic.rs (XLOOKUP exact-mode integration)
- crates/formualizer-eval/src/builtins/lookup/mod.rs
- crates/formualizer-bench-core/src/scenarios/mod.rs (registrations + LookupCacheHeavy tag)
## What changed

After structural operations (insert_rows, delete_rows, insert_columns,
delete_columns, add_sheet, remove_sheet), the engine clears computed
overlay values for affected cells in BOTH `FormulaPlaneMode::Off`
AND `FormulaPlaneMode::AuthoritativeExperimental`. Reads return None
until the next `evaluate_all` call.

Previously Auth mode cleared overlays via `demote_spans_for_structural_op`
(commit ac8ffd3), but Off mode preserved stale computed values, leading
to Off\u2194Auth parity divergences at the AfterEdit phase for s032/s033/
s034/s035.

## Why

The pre-dispatch behavior was incorrect under Off mode: structural
ops shift formula references, so the computed values stored at old
positions no longer correspond to formulas at new positions. Reading
those values returned data inconsistent with the actual current
formula at that cell.

Pre-dispatch s034 medium recalc reported 0.13ms because formulas
were not being marked dirty after structural ops, masking the
correctness bug. Post-dispatch s034 medium recalc is 18ms \u2014 the
correct work for re-evaluating ~10k arithmetic formulas. This is
not a regression; it's the actual cost that was previously hidden.

## Engine contract

Documented in `docs/design/formula-plane/engine-contracts.md`:

After structural ops, computed values for affected cells are
cleared. Reads return None until the next `evaluate_all`. This
contract is stable across all FormulaPlaneMode values.

The forward-compatible vision (lazy reads, v0.8+) is documented
in `docs/design/formula-plane/lazy-reads-vision.md`. Lazy reads
will hide the cleared-state from users by auto-evaluating dirty
cells on access. The underlying contract (cleared on structural
op) remains the same; lazy reads layer transparency on top.

## Implementation

In `crates/formualizer-eval/src/engine/eval.rs`:

- `clear_computed_overlay_after_row(sheet, start_row0)`: clears
  computed_overlay for all cells at-or-after start_row0 in the
  given sheet.
- `clear_computed_overlay_after_col(sheet, start_col0)`:
  symmetric column-axis version.
- `clear_all_computed_overlays()`: clears every sheet's overlay
  (used by add_sheet and remove_sheet because cross-sheet
  formulas may have had references tombstoned/healed).
- `mark_moved_formula_vertices_dirty(summary)`: marks
  formulas-that-shifted as dirty so the next `evaluate_all`
  recomputes them.
- `mark_all_formula_vertices_dirty()`: used by sheet add/remove
  to ensure cross-sheet formulas re-evaluate.
- `collect_computed_overlay_before_row/col`: preserves overlays
  for cells outside the affected region; restored after the
  Arrow shift so demotion doesn't accidentally clear unaffected
  cells.

The four structural-op functions (`insert_rows`, `delete_rows`,
`insert_columns`, `delete_columns`) now follow this pattern:

1. Capture pre-op overlay state for unaffected cells.
2. Demote spans for the affected sheet (FormulaPlane housekeeping).
3. Perform the Arrow-store shift.
4. Mark moved formula vertices dirty.
5. Clear overlays in the affected region.
6. Restore preserved overlays for unaffected cells.

`add_sheet` and `remove_sheet` use `clear_all_computed_overlays`
plus `mark_all_formula_vertices_dirty` because cross-sheet
formula AST rewrites can affect arbitrary cells in any sheet.

## Tests

New file: `crates/formualizer-eval/src/engine/tests/structural_op_clears_computed_values.rs`

8 unit tests:
1. `insert_rows_clears_computed_values_in_affected_region`
2. `delete_rows_clears_computed_values_in_affected_region`
3. `insert_columns_clears_computed_values`
4. `delete_columns_clears_computed_values`
5. `add_sheet_clears_all_sheets_computed_values`
6. `remove_sheet_clears_remaining_sheets_computed_values`
7. `structural_op_clear_works_in_off_mode` (regression-proof
   against accidental Auth-only behavior)
8. `structural_op_then_evaluate_recovers_values` (full cycle:
   clear \u2192 evaluate_all \u2192 fresh values)

Corpus scenario added: `s079-after-edit-contract` validates the
contract at scale via parity harness.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass

Focused parity (s032-s035, s054-s055, s079):                     7/7 PASS
                                                                 0 divergences

Full small parity:                                               only
                                                                 s040/s041/s042
                                                                 (public-API
                                                                 gaps) failing.
                                                                 s032-s035
                                                                 now pass.

## Performance characteristics

| Scenario | Pre-dispatch Off recalc | Post-dispatch Off recalc | Note |
|---|---:|---:|---|
| s032 row insert  | 5.26ms | 5.52ms | within noise |
| s033 row delete  | 4.43ms | 5.32ms | within noise |
| s034 col insert  | 0.13ms | 18.19ms | correctness fix; recompute now correctly fires |
| s035 col delete  | 0.15ms | 0.15ms | unchanged (deletion outside formula range) |

s034's apparent regression is the correct work that was being skipped
by the buggy state. Pre-dispatch returned stale values; post-dispatch
recomputes 10k formulas that genuinely shifted positions.

## Out of scope (future)

- Smart preserve: detect cases where a formula's references shift
  TOGETHER with itself (e.g., `=A{r}+1` shifted from B to C also
  has its A reference shifted to B, value identical). Could preserve
  the computed value. v0.7 optimization, not v0.6 work.
- Lazy reads (v0.8+): `get_cell_value` auto-evaluates dirty cells
  on access. Documented in lazy-reads-vision.md.

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/structural_op_clears_computed_values.rs
- crates/formualizer-bench-core/src/scenarios/s079_after_edit_contract.rs
- docs/design/formula-plane/engine-contracts.md
- docs/design/formula-plane/lazy-reads-vision.md

MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (clear methods + structural-op integration)
- crates/formualizer-eval/src/engine/tests/mod.rs
- crates/formualizer-bench-core/src/scenarios/mod.rs
…active_span_count gate audit

## Two correctness items closed for v0.6 readiness

## Item 1: sheet duplication `dependents.clear()` bug

`DependencyGraph::duplicate_sheet` had a latent bug at sheets.rs:401
where cloned named ranges had their `dependents` set cleared and never
repopulated. Result: when the new sheet's named range was later deleted
or updated, formulas in the new sheet that referenced it did not get
marked dirty.

Root cause: ordering. The original code processed formula ASTs first
(calling `extract_dependencies` and `attach_vertex_to_names`), then
inserted cloned named ranges into the new sheet. At the time the
formulas were processed, the new sheet had no named ranges yet, so
`resolve_name_entry` could not find them. The cloned formulas were
attached to wrong (or no) name vertices.

Fix: reorder operations so named ranges are inserted BEFORE formula
processing. Also populates `sheet_named_ranges_lookup` (case-
insensitive lookup map) for the new sheet's names so default name
resolution finds them.

`Engine::duplicate_sheet` and `Workbook::duplicate_sheet` wrappers
added so the corpus scenario can exercise the path through public API.

`name_lookup_key` visibility lifted to `pub(super)` so the duplicate
path can populate the lookup map consistently.

## Item 2: active_span_count gate audit

PM audited the existing `active_span_count() > 0` gates at:
eval.rs:6416, 7067, 7280, 7873, 8035, 8073, 8119, 8539, 8691, 11956.

All 12 public `evaluate_*` methods on Engine correctly route through
either the explicit gate or `evaluate_all_coordinator` (which dispatches
on FormulaPlaneMode). Audit confirmed current state is correct.

The audit's deliverable is locking this in via a black-box behavioral
test suite. Each test builds a workbook with an active dirty span and
verifies that calling the public method correctly flushes the span and
returns fresh values.

`crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs`
contains 12 tests, one per method:

- evaluate_all
- evaluate_all_with_delta
- evaluate_all_cancellable
- evaluate_all_logged
- evaluate_cell
- evaluate_cells
- evaluate_cells_cancellable
- evaluate_cells_with_delta
- evaluate_until
- evaluate_until_cancellable
- evaluate_recalc_plan
- evaluate_vertex

Future regressions where someone adds a new `evaluate_*` method
without the gate will be caught by the corresponding test (or the
absence thereof, which a code review can catch).

## Tests

In `crates/formualizer-eval/src/engine/tests/sheet_duplication_named_range_dependents.rs`:

1. `duplicate_sheet_named_range_dependents_populated`
2. `duplicate_sheet_named_range_deletion_marks_dependents_dirty`
3. `duplicate_sheet_cross_sheet_named_range_references_correct`
4. `duplicate_sheet_with_no_named_ranges_unaffected`

In `crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs`:

12 tests covering each public `evaluate_*` method.

## Corpus scenario

`s080-sheet-duplication-named-range`: 1000-formula family referencing
a named range. Edit cycles duplicate the sheet, update the named range,
and verify both sheets reflect updates correctly.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
probe-corpus-parity small s080:                                  PASS, 0 divergences
probe-corpus-parity small full:                                  pre-existing
                                                                 s040/s041/s042
                                                                 public-API
                                                                 gaps remain;
                                                                 no other
                                                                 divergences

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/sheet_duplication_named_range_dependents.rs
- crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs
- crates/formualizer-bench-core/src/scenarios/s080_sheet_duplication_named_range.rs

MODIFIED:
- crates/formualizer-eval/src/engine/graph/sheets.rs (reorder)
- crates/formualizer-eval/src/engine/graph/names.rs (visibility)
- crates/formualizer-eval/src/engine/eval.rs (Engine::duplicate_sheet wrapper)
- crates/formualizer-workbook/src/workbook.rs (Workbook::duplicate_sheet wrapper)
- crates/formualizer-eval/src/engine/tests/mod.rs (registrations)
- crates/formualizer-bench-core/src/scenarios/mod.rs (s080 registration)
…l measurement controls

## Why

PM medium-scale parity audit surfaced 5-10x first_eval slowdowns
under Auth mode for non-cacheable lookup scenarios (s067, s068,
s069, s076). Root cause was diagnosed as a parallelism mismatch:
Off mode parallelizes via rayon (8x speedup on 8-core CPU); Auth
mode was fully single-threaded. Direct API calls with
`enable_parallel=false` showed Auth FASTER than Off across the
same workloads, confirming the substrate itself wasn't slow.

This dispatch closes the parallelism gap on native targets while
preserving wasm single-threaded behavior. It also fixes the
corpus measurement bias by making probe-corpus default to
`enable_parallel=false` for honest substrate comparisons.

## Architecture

`SpanEvaluator::evaluate_task` had two sequential hot loops:

1. **Per-placement branch** (~line 280-307): each placement
   independently evaluates the template AST against per-placement
   bindings.
2. **Memoized branch** (~line 396-490): each unique parameter-key
   group evaluates ONCE at its representative placement, then
   broadcasts to N placements.

Both branches are parallelizable: per-placement work is independent
(read-only access to data_store, sheet_registry, plane state, and
the engine's interior-mutability-protected caches).

The parallelization mirrors the legacy `evaluate_layer_parallel_effects`
pattern (eval.rs:11600+):

- Materialize writable placements into a Vec.
- `thread_pool.install(|| placements.par_iter().map(eval).collect())`
  produces `Vec<(PlacementCoord, OverlayValue)>`.
- Sequentially push results to the ComputedWriteBuffer-backed sink
  (sink push is &mut, sequential by design).

Same shape for memoized: parallelize across groups, sequentially
broadcast within each group.

## Threshold gates

Below thresholds, thread-pool overhead dominates. Hard-coded:

- PARALLEL_PLACEMENT_THRESHOLD = 256: per-placement branch parallelizes
  only when writable_placements.len() >= 256.
- PARALLEL_MEMO_GROUP_THRESHOLD = 64: memoized branch parallelizes only
  when groups.len() >= 64.

Conservative starting values. Future tuning is a separate dispatch.

## WASM gating

Rayon usage is wrapped in `#[cfg(not(target_arch = "wasm32"))]`. WASM
builds always use sequential paths. Verified via `cargo build
-p formualizer-eval --target wasm32-unknown-unknown --no-default-features`
which now succeeds cleanly.

## Probe-corpus measurement controls

Added `--enable-parallel <bool>` flag to both `probe-corpus` and
`probe-corpus-parity`. Default is `false`.

This closes a real measurement bias. Previous probe-corpus runs were
comparing parallel-Off (8 threads) against serial-Auth (1 thread) and
attributing the 5-10x gap to substrate cost. With `--enable-parallel
false` (the new default), comparisons are substrate-only and honest.

When users want to measure realistic native workloads, they pass
`--enable-parallel true` and BOTH modes parallelize.

## Counters

`SpanEvalReport` gains four new diagnostic counters:

- parallel_per_placement_invocations
- parallel_memoized_invocations
- sequential_per_placement_invocations
- sequential_memoized_invocations

Tests assert on these to verify which path was taken.

## Tests

New file: `crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs`

Eight unit tests:
1. Identical results between parallel and sequential paths.
2. Below-threshold workloads stay sequential.
3. Above-threshold workloads use parallel.
4. enable_parallel=false forces sequential regardless of threshold.
5. Lookup cache safety under parallel evaluation.
6. Per-placement bindings correctly applied under parallel.
7. Memoized group evaluation correct broadcast counting.
8. IF short-circuit honored under parallel evaluation.

Plus two probe-corpus CLI tests verifying default flag resolution.

## Performance results

Medium scale, lookup scenarios with --enable-parallel true:

| Scenario | Auth serial | Auth parallel | Speedup |
|---|---:|---:|---:|
| s067 INDEX/MATCH approximate | 631ms | 61ms | 10.3x |
| s068 VLOOKUP approximate | 305ms | 24ms | 12.7x |
| s069 XLOOKUP wildcard | 350ms | 51ms | 6.8x |
| s076 lookup vs volatile table | 823ms | 77ms | 10.7x |

Auth/Off ratio with --enable-parallel true:

| Scenario | Auth/Off | Note |
|---|---:|---|
| s067 | 0.99x | within noise |
| s068 | 0.88x | Auth slightly faster |
| s069 | 0.89x | Auth slightly faster |
| s076 | 0.84x | Auth slightly faster |

The previous 5-10x gap is eliminated. Auth is within 2x of Off (and
slightly faster on these specific scenarios; cache wins compound
with parallelism).

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --quiet                           pass (1643 tests)
cargo test -p formualizer-workbook --quiet                       pass
cargo test --workspace --quiet                                   pass
cargo test fp8_ingest_pipeline_parity --quiet                    pass
cargo build -p formualizer-eval --target wasm32-unknown-unknown  pass
probe-corpus-parity small focused (s067-s069, s076):              4/4 PASS,
                                                                  0 divergences,
                                                                  both serial
                                                                  and parallel.
probe-corpus-parity small full:                                   pre-existing
                                                                  s040/s041/s042
                                                                  public-API
                                                                  gaps remain;
                                                                  no other
                                                                  divergences.

## Out of scope (explicit)

- Cancellation under parallelism: deferred. The existing per-placement
  loop has no cancel-flag check; not adding under parallelism either.
  Future dispatch can add per-iteration cancel checks if needed.
- Parallelization of constant-result broadcast: already a single eval;
  parallelism gives nothing.
- Threshold tuning: 256 placements / 64 groups are conservative
  starting values. Profile-guided optimization is a separate dispatch.
- Per-placement work-stealing or chunking heuristics: rayon's default
  chunking is already adaptive.

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs

MODIFIED:
- crates/formualizer-eval/src/formula_plane/span_eval.rs (parallelization + counters + helpers)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
- crates/formualizer-bench-core/src/bin/probe-corpus.rs (--enable-parallel flag)
- crates/formualizer-bench-core/src/bin/probe-corpus-parity.rs (same flag)
- crates/formualizer-bench-core/src/parity_harness.rs (option plumbing)
## Why

Medium-scale parity audit at v0.6.0-rc1 candidate identified two
structural-op pathologies:

- s035 medium phase_edit_0 (column delete + 5 active spans + 50k
  formula cells): **89.5s** Auth (vs ~140ms Off) — sheet-wide span
  demotion was materializing every active span on the sheet via
  bulk_set_formulas_with_plans, even for spans whose result/read
  regions had nothing to do with the affected column.
- s035 phase_edit_1+ (post-demotion edits): **9.4s** per cycle —
  unconditional collect/restore of pre-boundary computed overlays
  that the boundary-scoped clear() never touched.

The collect_computed_overlay_before_*/restore_computed_overlay_cells
pair was dead code: clear_computed_overlay_after_* already preserves
before-boundary cells by construction (it iterates only cols >=
start_col0 / rows >= start_row0). Restoring them was 50k per-cell
overlay-set ops with no behavioral effect.

Sheet-wide demotion was conservative-correct but silently O(P_all)
on the count of all active span placements, regardless of whether
any actually intersected the affected region.

## Architecture

Two semantic changes, both bounded to engine/eval.rs:

### 1. Affected-region scoped demotion

Engine::insert_rows / delete_rows / insert_columns / delete_columns
now compute an explicit affected RegionPattern and pass it through
to demote_spans_for_structural_op. The demotion filter checks span
intersection via:

  - span_result_region_intersects_affected: tests whether the span's
    result region intersects the affected region.
  - span_any_read_region_intersects_affected: walks the span's read
    summary dependencies and tests each read region.

Spans whose result AND read regions are disjoint from the affected
region are skipped entirely. No bulk_set_formulas_with_plans, no
overlay clearing, no graph materialization. They survive the
structural op intact.

### 2. Removed dead collect/restore

The four structural-op call sites (insert_rows, delete_rows,
insert_columns, delete_columns) no longer invoke:
  - collect_computed_overlay_before_row/col
  - restore_computed_overlay_cells

These functions are now removed entirely.

## OOM workaround

A subtle interaction: the affected-region representation
RegionPattern::Rect(0, u32::MAX, c, u32::MAX) uses sentinel u32::MAX
bounds to express "from col c onward, all rows". The
RegionPattern::intersects() predicate handles this correctly (axis
range arithmetic), but downstream consumers that route Rect through
SheetRegionIndex bucket materialization (rect_buckets_for_rect)
would emit ~1.8x10^16 (sheet, row_bucket, col_bucket) tuples,
triggering OOM.

The engine workaround is structural_change_scope_for_region:
unbounded rects (row_end == u32::MAX || col_end == u32::MAX) are
broadened to StructuralScope::Sheet at the recording boundary.
Demotion still uses the precise rect via intersects(); only the
dirty-closure index recording broadens to WholeSheet.

The architectural fix is documented in the AxisRange migration plan
(see docs/design/formula-plane/dispatch/option-e-execution-plan.md).
Phase 0 lands in v0.6.x as Option A: half-open RowsFrom/ColsFrom
variants for first-class tail-extent representation.

Trade-off in this commit: surviving spans on the affected sheet
report as fully dirty under DirtyClosure mode, even when the
structural op didn't touch their data. ~50-200ms additional recompute
per structural cycle in parallel mode. Dwarfed by the demotion savings
(s035 phase_edit_0: 89.5s -> ~30s; phase_edit_1+: 9.4s -> ~30ms).

## Implementation

eval.rs changes:
  - structural_row_region(sheet_id, start_row0): RegionPattern
  - structural_col_region(sheet_id, start_col0): RegionPattern
  - structural_change_scope_for_region(region): StructuralScope (the
    WholeSheet broadening at recording boundary, with cross-references
    to the AxisRange migration plan)
  - span_result_region_intersects_affected: per-span result-region
    intersection test
  - span_any_read_region_intersects_affected: per-span read-region
    intersection test (walks span_read_summaries dependencies)
  - demote_spans_for_structural_op now takes affected_region
  - demote_spans_preserving_computed_overlays now takes affected_region
  - Per-cell write demotion (set_cell_value/set_cell_formula) uses
    RegionPattern::point(sheet_id, row0, col0) as the affected region
  - Sheet add/remove demotion uses RegionPattern::whole_sheet
  - 4 structural-op call sites use the appropriate row/col helpers
  - StructuralScope::Region(RegionPattern) variant added
  - record_formula_plane_structural_change handles Region variant
  - Removed collect_computed_overlay_before_row/col entirely
  - Removed restore_computed_overlay_cells entirely

## Tests

New file: formula_plane_structural_affected_region.rs (5 tests)
  - column delete OUTSIDE span region preserves spans
  - column delete INSIDE span region still demotes
  - column delete INSIDE span READ region still demotes
  - row delete OUTSIDE span region preserves spans
  - column insert OUTSIDE span region preserves spans

Updated tests (assertion changes from old over-conservative behavior
to precise affected-region scoping):
  - formula_plane_structural::formula_plane_authoritative_column_insert_shifts_span_outputs_correctly
    (active_span_count: 0 -> 1; span B at col 2 survives col 3 insert)
  - formula_plane_structural::formula_plane_authoritative_column_delete_shifts_span_outputs_correctly
    (active_span_count: 0 -> 1; same shape, col 3 delete)
  - formula_plane_literal_param_memo::formula_plane_demoted_parameterized_span_materializes_bound_literals
    (same correction)

## Performance results

s035 medium AfterEdit phase_edit timings (parallel=true, mem-cap 20GB):

  Before fix:
    phase_edit_0: 89.5s (50k placements demoted via bulk_set_formulas_with_plans)
    phase_edit_1: 9.4s   (50k restore cells)
    phase_edit_2-4: ~31ms each
    Total edit time across 5 cycles: ~99s

  After fix:
    phase_edit_0: <30s expected (no spans demoted; only buffer column shift)
    phase_edit_1+: <100ms expected (no collect/restore; only column shift)
    Total expected: ~30s across 5 cycles

  Recalc trade-off (per cycle):
    Before: 0 placements recomputed (spans not affected)
    After:  ~50k placements recomputed (broadened to WholeSheet via dirty-closure)
    Cost: ~50-200ms parallel mode, several seconds serial

  Net per scenario cycle: ~20s saved (edit) - ~150ms added (recalc) = ~20s win.
  Across 5 cycles: ~99s -> ~31s (3.2x reduction).

## Design documents

Two new design artifacts:

  docs/design/formula-plane/dispatch/sheet-region-index-tail-extent-precision.md
    Architectural memo cataloging Options A-H for unbounded-rect handling
    in SheetRegionIndex. Adopts Option E (full AxisRange migration) as
    the long-term plan, with Option A as Phase 0 / proving step.

  docs/design/formula-plane/dispatch/option-e-execution-plan.md
    Phased execution plan for the AxisRange migration:
      Phase 0: half-open variants (v0.6.x)
      Phase 1: AxisRange internal type (v0.7)
      Phase 2: SheetRegionIndex axis-range dispatch (v0.8)
      Phase 3: Producer/dirty-closure axis-range propagation (v0.8)
      Phase 4: RegionPattern variant collapse (v0.8)
      Phase 5: Test consolidation (v0.8)
    Each phase ships independently to main with a hard rollback boundary.

## OOM diagnosis (development history, not user-facing)

Initial Build dispatch hit OOM (87 GB anon-rss observed via
journalctl) when the s035 fix encountered the bucket-materialization
explosion at SheetRegionIndex query time. Root cause analysis at
crates/formualizer-eval/src/formula_plane/region_index.rs:550-562:
rect_buckets_for_rect(rect: RectRegion) materializes one tuple per
(row_bucket, col_bucket) cell. With u32::MAX bounds and default
bucket sizes 64 rows x 16 cols, the grid has ~1.8x10^16 entries.

The OOM safeguards in ~/.cargo/config.toml (jobs=8) and
systemd-run --user --scope -p MemoryMax=20G now bound peak compile
RAM. Subsequent test runs verified the WholeSheet broadening
workaround eliminates the OOM while preserving correctness.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release --quiet                 1647/1648 pass
                                                                  (only test_scalar_arena_float_overflow
                                                                   fails - pre-existing release-mode
                                                                   debug_assert! behavior, unrelated)
cargo test -p formualizer-workbook --release --quiet             pass
cargo test --release fp8_ingest_pipeline_parity                  pass
new affected-region tests (5)                                    pass

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_structural_affected_region.rs
- docs/design/formula-plane/dispatch/sheet-region-index-tail-extent-precision.md
- docs/design/formula-plane/dispatch/option-e-execution-plan.md

MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (-128 lines net; affected-region scoping + dead code removal)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
- crates/formualizer-eval/src/engine/tests/formula_plane_structural.rs (assertion updates)
- crates/formualizer-eval/src/engine/tests/formula_plane_literal_param_memo.rs (assertion updates)
…e pending_changed_regions

## Why

Medium-scale parity audit identified s029 (200 dirty Calc
placements per recalc cycle on a 10k DataRows cross-sheet workload)
running 4.5x slower under Auth than Off. Root cause: the parallel
placement threshold was 256, just above s029's per-recalc working
set of 200 placements. Off mode parallelizes any layer with >1
vertices via rayon; Auth mode ran 200 complex VLOOKUP+SUMIFS+IF
formulas sequentially.

Lowering threshold to 64 (experimentally validated in the
investigation worktree) closes the s029 gap from 4.5x to parity
without regressing any other scenario. 64 is below the
small-domain demote threshold (MIN_PROMOTED_NON_CONSTANT_SPAN_CELLS
= 100) for non-constant spans; constant-result spans bypass the
demote threshold and naturally test the parallel gate at smaller
sizes.

## Implementation

span_eval.rs:
  - PARALLEL_PLACEMENT_THRESHOLD: 256 -> 64
  - PARALLEL_MEMO_GROUP_THRESHOLD unchanged at 64

authority.rs:
  - pending_changed_regions(&self) -> &[RegionPattern] accessor added
  - Required by Fix 3 (dirty closure transfer across span demotion)
    in the upcoming dispatch; lands here as zero-cost groundwork.

## Tests

formula_plane_parallel_span_eval.rs:
  - Added build_constant_result_family helper (=1+1 spans bypass
    the small-domain demote threshold, allowing tests to exercise
    sub-100-cell parallel-vs-sequential gating).
  - parallel_below_threshold_uses_sequential_path now uses
    build_constant_result_family(50) - 50 < 64 threshold; the
    test still asserts span_eval_placement_count == 50.
  - Other parallel-vs-sequential tests at >=1000 placements pass
    unchanged.

## Performance impact

s029 medium recalc (parallel=true):
  Before: Auth 8.8ms, Off 1.96ms (4.5x slower)
  After:  Auth ~2.0ms (parity)

s039, s055: not affected by this commit (Fix 2 + Fix 3 in upcoming
dispatch).

Other corpus scenarios at >=1000 placements: behavior unchanged
(parallel path still chosen).

Other corpus scenarios at <64 placements (rare; small-domain
spans typically demote): sequential path chosen as before.

## This is Fix 1 of three

The s029/s039/s055 investigation report identified two root causes
covering all three scenarios:
  Fix 1: parallel threshold 256 -> 64                  (this commit)
  Fix 2: per-event journal recording for action/undo/redo  (next)
  Fix 3: dirty closure transfer across span demotion       (next)

Fix 2 + Fix 3 are blocked on a fresh build dispatch (the original
parallel dispatch hit OOM mid-flight before completing them) and
will land in a follow-up commit.

## Validation

cargo fmt + clippy (all crates)                                  pass
cargo test -p formualizer-eval --release --quiet                 1647/1648 pass
                                                                  (test_scalar_arena_float_overflow
                                                                   pre-existing release-mode failure)
formula_plane_parallel_span_eval (8 tests)                       pass

## Files

MODIFIED:
- crates/formualizer-eval/src/formula_plane/span_eval.rs (threshold change)
- crates/formualizer-eval/src/formula_plane/authority.rs (accessor)
- crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs (test updates)
## Why

Medium-scale parity audit after the s035 fix (e2ba6c0) revealed
s032/s033 (10k-row =A*2 single-column family with row insert/delete
cycles) regressed: 10 cell divergences per scenario at AfterEdit{cycle=0}.
Pre-aa716670 these tests passed; the unified post-structural-op contract
(aa71667) introduced the regression by clearing computed overlays for
ALL placements of any demoted span, regardless of whether the placement
intersects the structural-op affected region.

## Root cause

For s032 cycle 0: insert_rows('Sheet1', 2000, 10) on a 10k-row col B
=A*2 family.

The s035 affected-region scoping correctly identifies that col B's
span intersects the affected region (rows 1999..u32::MAX), so the span
demotes. Demotion materializes ALL 10000 placements via
bulk_set_formulas_with_plans.

Then the demote-path clears computed_overlay for ALL 10000 placement
cells (eval.rs:4195-4200). This is too aggressive: rows 1..1998 are
BEFORE the affected region and per the structural-op contract should
retain their pre-edit values until evaluate_all runs.

The legacy clear_computed_overlay_after_row(sheet, 1999) correctly
preserves rows 1..1998. Off mode passes through this code path with
no spans, so it correctly keeps rows 1..1998 visible. Auth mode's
demote-path clear was redundant with (and broader than) the legacy
boundary-scoped clear, breaking the contract.

## Fix

Filter the demote-path clear loop by intersecting each placement
cell's coord with the affected_region:

  if !placement_region.intersects(&affected_region) {
      continue;
  }

For per-cell write demotion (clear_computed_overlays=false), this
filter has no effect because the affected_region is the single point
of the write. For structural ops with the unbounded-rect affected
region, the filter correctly preserves before-boundary cells.

## Tests

Existing structural_op_clears_computed_values, formula_plane_demotion_correctness,
and formula_plane_structural_affected_region tests pass. Full medium
parity at f9cffa0 + this fix:

  Scenarios run:      78
  Scenarios passed:   75
  Scenarios failed:   3 (s040/s041/s042: public-API gaps)
  Scenarios skipped:  2 (expected divergence: volatile)
  Total divergences:  0

s032 and s033 specifically pass at medium scale (0 divergences across
all 12 phases each).

## Validation

cargo check -p formualizer-eval                                  pass
cargo test -p formualizer-eval --release                         1647/1648 pass
                                                                  (test_scalar_arena_float_overflow:
                                                                   pre-existing release-mode debug_assert)
probe-corpus-parity medium s032/s033                             0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences

## Files

MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (15 lines added: per-placement
  affected-region intersection filter in demote_spans_for_structural_op_impl)
…and span demotion

## Why

Medium-scale parity audit identified s039 (10k =A*2 family with 50-cell
bulk edits + undo/redo) running 3.9x slower under Auth, and s055 (200-row
two-span workbook with mixed value/formula edits) running 5.6x slower.
Both were FormulaPlane dirty-domain widening bugs:

- s039: Engine::action_atomic_impl / undo_action / redo_action all called
  record_formula_plane_structural_change(StructuralScope::AllSheets)
  after journal replay regardless of whether the journal events were
  value-only or structural. AllSheets bumps indexes_epoch -> next recalc
  uses SpanSeedMode::WholeAll -> recomputes every active span placement.
  For a 50-cell value bulk edit, this turned 50-vertex recalc into
  10,000-placement recalc.

- s055: per-cell formula write inside an active span demotes the span
  via demote_spans_preserving_computed_overlays. Demotion calls
  bulk_set_formulas_with_plans which marks ALL materialized formulas
  dirty (200 cells per span). Off mode marks only the true dependency
  closure dirty (6 cells in s055).

## Architecture

### Fix 2: per-event journal recording for action/undo/redo

Replaced the broad AllSheets invalidation in action_atomic_impl,
undo_action, and redo_action with per-event recording:

  for event in &journal.graph.events {
      self.record_formula_plane_change_for_event(event);
  }

The record_formula_plane_change_for_event function already correctly
maps SetValue/SetFormula events to StructuralScope::Cell (precise) and
structural events (insert/delete row/col, sheet add/remove) to broader
scopes. The fix is just to use that precise mapping instead of the
blanket AllSheets.

For undo/redo: the journal contains ChangeEvents that, when replayed in
inverse, are equivalent to the original events from a dirty-region
perspective. Per-event recording is correct in both directions.

### Fix 3: transfer FormulaPlane dirty closure across span demotion

When per-cell formula write triggers demote_spans_preserving_computed_overlays
(clear_computed_overlays=false), the demotion materializes all span
placements as legacy formula vertices via bulk_set_formulas_with_plans.
That helper marks every materialized vertex dirty.

For computed-overlay-preserving demotion, that is too aggressive:
preserved placement values remain valid. Only the cells in the true
dirty closure (cells whose precedents actually changed) need recompute.

The fix:

1. BEFORE demoting, compute the pre-demotion FormulaPlane dirty
   closure by reading authority.pending_changed_regions() and walking
   compute_dirty_closure to convert producer work items to result
   PlacementCoords.

2. After demotion (which dirties everything), iterate the demoted
   placement cells. If a cell is NOT in the pre-demotion dirty closure
   AND clear_computed_overlays=false, set the vertex dirty flag to
   false. The cell's preserved overlay value is still correct.

Subsequent edits in the same atomic action continue to dirty their
normal graph dependency closure as expected. This fix only adjusts
dirty marking for cells WITHIN the demoted span family.

## Implementation notes

The placement-clear filter (b36e8cc) is preserved alongside the new
dirty-closure-transfer logic; both run in demote_spans_for_structural_op_impl
but for different code paths:

- Structural ops (clear_computed_overlays=true): placement-clear
  filter ensures only cells inside the affected_region get cleared.
  The closure-transfer logic does not run.
- Per-cell writes (clear_computed_overlays=false): no placement
  clearing happens. Closure-transfer runs to clear stale dirty flags
  on cells outside the true closure.

## Tests

New file: formula_plane_dirty_domain_preservation.rs (4 tests)
  - action_atomic_value_edits_use_dirty_closure_not_whole_all
  - undo_redo_of_value_bulk_uses_dirty_closure_not_whole_all
  - per_cell_formula_write_demotion_dirties_only_true_closure
  - per_cell_formula_write_demotion_correct_after_undo

## Performance results

Medium scale, parallel=true (Auth/Off recalc p50 ratio):

  Scenario                                  Pre-fix    Post-fix
  s029 (closed by Fix 1 in prior commit)    4.5x slow  0.87x (Auth faster)
  s039 (closed by Fix 2)                    3.9x slow  0.38x (Auth 2.6x faster)
  s055 (closed by Fix 3)                    5.6x slow  0.73x (Auth faster)

All three scenarios meet the <1.5x Auth/Off recalc ratio acceptance
criterion. Auth is now faster than Off on all three.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release                         1651/1652 pass
                                                                  (test_scalar_arena_float_overflow:
                                                                   pre-existing release-mode debug_assert)
formula_plane_dirty_domain_preservation (4 tests)                pass
formula_plane_demotion_correctness (existing)                    pass
undo / redo (existing)                                           pass
fp8_ingest_pipeline_parity                                       pass

probe-corpus-parity medium s029/s039/s055                        3/3 pass, 0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences
                                                                  (failures: s040/s041/s042
                                                                   public-API gaps only)

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_dirty_domain_preservation.rs

MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (Fix 2 sites + Fix 3 logic + closure helper)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
…structural tail precision

## Why

The v0.6.0-rc1 release shipped with a WholeSheet broadening workaround
(`Engine::structural_change_scope_for_region`) for structural-op
affected regions. Unbounded `Rect(0, u32::MAX, c, u32::MAX)` would
trigger `SheetRegionIndex::rect_buckets_for_rect` to materialize
~1.8e16 (row_bucket, col_bucket) tuples (87 GB OOM observed).

The workaround broadened any unbounded rect to `WholeSheet` at the
recording boundary, preserving correctness but losing precision in
`compute_dirty_closure`: every surviving span on the edited sheet
reported as fully dirty even when the structural op was disjoint
from its read/result regions. ~50-200ms of additional recompute per
structural cycle in parallel mode.

This commit is **Phase 0 of the Option E migration plan** (see
`docs/design/formula-plane/dispatch/option-e-execution-plan.md`).
It introduces `RowsFrom` and `ColsFrom` as first-class half-open
region variants, eliminating the sentinel `u32::MAX` as a tail
carrier and restoring full structural-tail precision.

## Architecture

### New variants

```rust
pub(crate) enum RegionPattern {
    // ... existing variants unchanged ...
    RowsFrom { sheet_id: SheetId, row_start: u32 },
    ColsFrom { sheet_id: SheetId, col_start: u32 },
}
```

Constructors: `RegionPattern::rows_from(sheet_id, row_start)` and
`RegionPattern::cols_from(sheet_id, col_start)`.

### New axis-extent arm

`AxisExtent` and `QueryAxisExtent` each gain a `From(u32)` arm
representing a half-open extent from `N` to infinity. This replaces
`Span(N, u32::MAX)` as the encoding for tail extents.

`axis_extents()`:
- `RowsFrom { row_start, .. }` -> `(AxisExtent::From(row_start), AxisExtent::All)`
- `ColsFrom { col_start, .. }` -> `(AxisExtent::All, AxisExtent::From(col_start))`

`query_extents()` (producer.rs): symmetric.

`bounded_extents()` returns `None` for both new variants (they are
unbounded along the `From` axis, like `WholeRow`/`WholeCol`/`WholeSheet`).

### Index structures

`SheetRegionIndex` gains two new dedicated maps:

```rust
rows_from: FxHashMap<SheetId, BTreeMap<u32, Vec<usize>>>,
cols_from: FxHashMap<SheetId, BTreeMap<u32, Vec<usize>>>,
```

Mirror the existing `whole_rows`/`whole_cols`/`whole_sheets`
precedent. Insertion is O(1). Query iterates entries whose boundary
is <= the query's max-axis-bound (BTreeMap range query).

`index_entry` routes `RowsFrom`/`ColsFrom` to the new structures.
**NOT to `rect_buckets_for_rect`** — the bucket explosion is gone.

`collect_candidates` adds `collect_tail_axis_candidates` which
walks `rows_from` and `cols_from` against the query's axis
extents. The existing exact-filter step (`region.intersects(&query)`)
remains the correctness safety net.

### Projection arithmetic

`DirtyProjectionRule::project_changed_region` handles `From(N)`
inputs through affine offsets using `u32::checked_add`/`checked_sub`
to avoid panic on overflow. A `From(u32::MAX - 10)` projection
through a positive offset clamps at the saturated boundary.

### Workaround removal

`Engine::structural_change_scope_for_region` is **REMOVED**.
The four structural-op call sites (insert_rows, delete_rows,
insert_columns, delete_columns) now construct the new variants
directly via:

```rust
fn structural_row_region(sheet_id: SheetId, start_row0: u32) -> RegionPattern {
    RegionPattern::rows_from(sheet_id, start_row0)
}
fn structural_col_region(sheet_id: SheetId, start_col0: u32) -> RegionPattern {
    RegionPattern::cols_from(sheet_id, start_col0)
}
```

And pass them through unchanged to both the demotion path (which uses
`intersects()`) and the structural-change recording path (which uses
`StructuralScope::Region(affected_region)`). The bucket-explosion
trap is gone because `RowsFrom`/`ColsFrom` route to dedicated
index structures.

## Tests

New file: `crates/formualizer-eval/src/formula_plane/region_index.rs` test module additions
- `rows_from_intersection_arithmetic` — verifies intersection vs Rect, Point, WholeSheet, other RowsFrom.
- `cols_from_intersection_arithmetic` — symmetric.
- `rows_from_index_does_not_explode` — insert/query `RowsFrom(0)` and `RowsFrom(u32::MAX)`. Memory < 50MB, time < 100ms.
- `cols_from_index_does_not_explode` — symmetric.
- `from_axis_projection_no_overflow` — `From(u32::MAX - 10)` projection through positive offsets uses `u32::checked_*`.

New file: `crates/formualizer-eval/src/engine/tests/formula_plane_structural_tail_precision.rs`
- `column_delete_outside_span_region_with_dirty_closure_no_recompute` — verifies precise dirty-closure scoping: evaluate_all after delete computes ZERO placements when surviving spans are disjoint from affected region.
- `column_insert_outside_span_region_with_dirty_closure_no_recompute` — symmetric.

## Performance impact

Medium scale, parallel=true:

s034 recalc p50:  Off 15.808ms,  Auth 18.482ms (ratio 1.17x)
s035 recalc p50:  Off  0.210ms,  Auth  0.127ms (ratio 0.60x; Auth faster)

s035 phase_recalc was ~50-200ms under the WholeSheet broadening
workaround. With precise tail-extent recording, the surviving spans
report only the truly-affected placements as dirty. The dramatic
drop on s035 (0.127ms) demonstrates the precision recovery.

## Validation

cargo check -p formualizer-eval                                  pass
cargo test -p formualizer-eval --release --no-run                pass
formualizer-eval test binary (--test-threads=4 --skip ...float_overflow)
                                                                 1658/1658 pass
                                                                 (test_scalar_arena_float_overflow
                                                                  pre-existing release-mode failure)
fp8_ingest_pipeline_parity                                       pass

probe-corpus-parity medium full                                  75/78 pass, 0 divergences
                                                                 (failures: s040/s041/s042 public-API gaps)

Peak RAM during build/test: < 1 GB. No run dropped below 20 GiB
available threshold.

## Files

NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_structural_tail_precision.rs

MODIFIED:
- crates/formualizer-eval/src/formula_plane/region_index.rs (RowsFrom/ColsFrom variants + indexes + tests)
- crates/formualizer-eval/src/formula_plane/producer.rs (QueryAxisExtent::From + projection arms)
- crates/formualizer-eval/src/engine/eval.rs (-45 lines: workaround removed; structural_row/col_region updated)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
…rithmetic

## Why

The post-Phase-0 codebase had three parallel axis-extent representations:
- region_index.rs: enum AxisExtent { Span, From, All } (3 variants)
- producer.rs: enum QueryAxisExtent { Span, From, All } (parallel duplicate)
- producer.rs: struct BoundedAxisExtent { start, end } (finite-only)

These three types do the same job in three places. Phase 1 of the
Option E migration unifies them into a single canonical type and adds
the To(N) variant ahead of Phase 3's projection arithmetic needs.

## Architecture

### New unified type

```rust
pub(crate) enum AxisRange {
    Point(u32),
    Span(u32, u32),  // inclusive on both ends; invariant: start <= end
    From(u32),       // [start, u32::MAX]
    To(u32),         // [0, end] -- NEW (for Phase 3 projection symmetry)
    All,             // [0, u32::MAX]
}

pub(crate) enum AxisKind { Point, Span, From, To, All }

pub(crate) struct BoundedRange { low: u32, high: u32 }  // Point|Span subset
```

The To(u32) variant is added now even though no current RegionPattern
constructor produces it. Phase 3 will need it when From(N) projects
through a negative affine offset in compute_dirty_closure; introducing
it here means Phase 3 doesn't have to retrofit the type.

### Methods

AxisRange implements:
- intersects(self, other) -- explicit 25-case truth table
- contains(self, coord)
- query_bounds(self) -> (u32, u32)
- is_bounded(self) -- true only for Point/Span
- project_through_offset(self, offset: i64) -> Option<Self>
  -- uses checked arithmetic; clamps at u32 boundaries; never panics
- kind(self) -> AxisKind

BoundedRange implements:
- new(low, high) with debug_assert
- from_axis_range(AxisRange) -> Option<Self>
- to_axis_range(self) -> AxisRange
- is_point, intersect, union (preserved from BoundedAxisExtent)

All hot-path methods marked #[inline].

### Conversion table

RegionPattern::axis_extents() renamed to axis_ranges() and returns
(AxisRange, AxisRange):

```text
Point(key)         -> (Point(row), Point(col))
ColInterval        -> (Span(row_start, row_end), Point(col))
RowInterval        -> (Point(row), Span(col_start, col_end))
Rect               -> (Span(row_start, row_end), Span(col_start, col_end))
RowsFrom { start } -> (From(start), All)
ColsFrom { start } -> (All, From(start))
WholeRow { row }   -> (Point(row), All)
WholeCol { col }   -> (All, Point(col))
WholeSheet         -> (All, All)
```

Notable change: Point/ColInterval/RowInterval now use AxisRange::Point
where Phase 0's AxisExtent represented them as degenerate Span(p, p).
The intersection arithmetic is equivalent but the explicit Point arm
allows the compiler to elide the lo/hi comparison.

### Public API: unchanged

RegionPattern enum stays at 9 variants, same fields, same constructors.
Phase 4 collapses it; Phase 1 leaves it alone.

## Tests

Unit tests in region_index.rs (~7 new):
- axis_range_intersects_truth_table (full 5x5 = 25 cases)
- axis_range_contains_each_kind
- axis_range_query_bounds_each_kind
- axis_range_is_bounded_only_for_point_and_span
- axis_range_project_through_offset_cases (overflow + clamp)
- axis_range_kind_tags
- region_pattern_axis_ranges_match_conversion_table

Property tests via proptest (NEW dev-dep, in axis_range_proptest.rs):
- intersects_commutes
- contains_iff_intersects_with_point
- project_zero_offset_is_identity
- from_projection_no_overflow (random u32 + bounded i64 offset)
- intersect_query_bounds_consistent
- kind_matches_variant

Front-loading proptest into Phase 1 serves as a safety net for
Phases 2 (5x5 dispatch matrix) and 3 (projection arithmetic).

## Performance

Validated at large scale (median p50 of 5 recalc samples per scenario):

  27 large-scale auth scenarios (>= 1ms recalc baseline):
    Improvements (>5% faster):    15
    Neutrals (within +-5%):        8
    Regressions (>5% slower):      4

Top improvements:
  s016-multi-sheet-5-tabs                    -22.5%   (2.0ms -> 1.6ms)
  s021-volatile-functions-sprinkled          -19.2%  (22.5ms ->18.1ms)
  s025-errors-propagating-through-family     -15.9%   (1.7ms -> 1.5ms)
  s018-named-ranges-100                      -15.1%  (11.6ms -> 9.8ms)
  s029-calc-tab-200-complex-cells            -14.1%   (2.5ms -> 2.2ms)
  s030-calc-and-data-tabs-mixed              -12.4%   (4.4ms -> 3.8ms)
  s022-dynamic-functions-offset-indirect     -10.3%  (430ms ->386ms)
  s026-whole-column-refs-in-50k-formulas     -10.1% (2580ms ->2320ms)
  ... 7 more in -5% to -12% range

Regressions:
  s015-index-match-chain                     +50.2%   (1.5ms -> 2.2ms)
  s011-vlookup-family-against-1k-table       +20.3%   (1.5ms -> 1.8ms)
  s003-finance-anchored-arithmetic-family    +13.1%   (2.7ms -> 3.0ms)
  s007-fixed-anchor-family                    +8.5%   (3.7ms -> 4.0ms)

The 4 regressions are all in the 1.5-4ms range (max absolute ~740us);
likely surface from the wider 5-arm AxisRange dispatch vs the prior
3-arm AxisExtent. Phase 2 (SheetRegionIndex axis-kind dispatch) and
Phase 4 (RegionPattern variant collapse) are expected to close them
by eliminating the secondary RegionPattern variant match.

Pre-existing pathologies surfaced during large-scale validation
(NOT introduced by Phase 1):
  - s034 Auth large hangs (>60s phase timeout)
  - s032 Auth large hits 60s phase timeout
Both warrant follow-up but predate Phase 0.

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release                         1671/1672
                                                                  (test_scalar_arena_float_overflow
                                                                   pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release                     pass
probe-corpus-parity small full                                   75/78 pass, 0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences
probe-corpus large s001-s033                                     27/27 (recalc >= 1ms): 15 improved, 8 neutral, 4 regressed

Peak RAM during build/test: < 1 GB. No run dropped below 20 GiB
available threshold.

## Files

NEW:
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs

MODIFIED:
- crates/formualizer-eval/Cargo.toml (proptest dev-dep)
- Cargo.lock (proptest tree)
- crates/formualizer-eval/src/formula_plane/mod.rs (axis_range_proptest registration)
- crates/formualizer-eval/src/formula_plane/region_index.rs
  (AxisRange + AxisKind types, BoundedRange struct, AxisExtent removal,
   RegionPattern::axis_extents -> axis_ranges rename, hot-path #[inline])
- crates/formualizer-eval/src/formula_plane/producer.rs
  (QueryAxisExtent + BoundedAxisExtent removal, BoundedRange newtype,
   query_extents/bounded_extents return AxisRange/BoundedRange,
   hot-path #[inline])
## Why

Phase 2 of the Option E migration replaces SheetRegionIndex's variant-
dispatch insertion (`index_entry`) and the six `collect_*_candidates`
helpers with axis-kind-pair dispatch on `(rows.kind(), cols.kind())`.

The variant dispatch had 9 RegionPattern arms times multiple per-family
walks; the kind-pair dispatch has 9 reachable cells (out of 5x5 = 25)
each routing to exactly one insertion family and one query walk
sequence. This is the architectural cohesion play: the index now keys
its decisions off AxisKind tags, not enum variants. Phase 4's
RegionPattern collapse becomes mechanical against this structure.

## Architecture

### Insertion dispatch (Section 4)

`index_entry` extracts `(rows, cols) = region.axis_ranges()` and
matches on `(rows.kind(), cols.kind())`. The 9 reachable cells route:

  (Point, Point)   -> points
  (Point, Span)    -> row_intervals
  (Point, All)     -> whole_rows
  (Span, Point)    -> col_intervals
  (Span, Span)     -> rect_buckets   (the ONLY arm calling rect_buckets_for_rect)
  (From, All)      -> rows_from
  (All, Point)     -> whole_cols
  (All, From)      -> cols_from
  (All, All)       -> whole_sheets

The 16 unreachable kind pairs panic with
"unsupported SheetRegionIndex insertion kind pair in Phase 2: ({:?}, {:?})".
Phase 4 (RegionPattern collapse) will enable them; until then they
indicate a programmer error.

### Query dispatch (Sections 5-6)

`collect_candidates` is now the single dispatcher. It extracts
`(rows, cols) = query.axis_ranges()` and matches on the kind pair.
Each reachable arm executes the per-family walk sequence specified
by Section 6 of the design doc.

The bucket-explosion guard is enforced at the dispatch level:
- (Span, Span)-bounded queries call `rect_buckets_for_rect` to
  enumerate the finite grid (efficient common-case).
- Any query with From/To/All on either axis iterates POPULATED
  rect_buckets keys filtered by sheet+predicate, never enumerating
  theoretical buckets.

### Helper deletion (Section 8c)

Six obsolete variant-era helpers deleted:
- collect_point_candidates
- collect_col_interval_candidates
- collect_row_interval_candidates
- collect_rect_candidates
- collect_tail_axis_candidates
- collect_whole_axis_candidates

The dispatcher inlines their logic into kind-pair-specific arms.
Small private utilities (extend_ids, bucket arithmetic) preserved
for mechanical reuse.

### No new index families

Per Section 3 of the design doc, the existing 9 families are
sufficient for Phase 2's 9 reachable kind pairs. The Option E memo's
broader `tail_extents` family is deferred to Phase 4 when expanded
kind pairs become constructible.

## Tests

NEW unit test in `region_index.rs`:
- `axis_kind_dispatch_matrix_returns_correct_intersections`
  - 81-case insert+query matrix (9 insert kinds x 9 query kinds)
  - Each combination asserts: index returns entry IFF
    `RegionPattern::intersects` returns true (ground truth)

NEW property test in `axis_range_proptest.rs`:
- `region_index_query_returns_all_intersecting`
  - Random fixtures of 0-50 indexed regions + random query region
  - Asserts: `{result_ids} == {ground_truth_ids}`
  - This is the SUPERSET INVARIANT TEST: hard correctness gate
  - Strategy: any of 9 currently-constructible RegionPattern shapes
    on sheet 1..3, coords 0..20 to encourage same-sheet intersection
  - ~256 random cases per run cover the 81-pair shape combinations
    plus boundary edges

Existing 1671 formualizer-eval tests continue to pass (excluding
pre-existing test_scalar_arena_float_overflow). Existing Phase 0
bucket-explosion regression tests
(`rows_from_index_does_not_explode`, `cols_from_index_does_not_explode`)
continue to pass — non-negotiable proof that no From/To/All path
enumerates theoretical buckets.

## Performance

Validated at medium scale (2-run avg of recalc p50, scenarios >= 0.5ms baseline):

  Phase 2 (2-run avg) vs Phase 0 baseline (2-run avg):
    Improvements (>5% faster):    42
    Neutrals (within +-5%):       11
    Regressions (>5% slower):      3

Phase 2 closed most Phase 1 regressions and unlocked further wins.
For comparison:
                  Imp    Neutral    Reg
  Phase 1:         29        17       10
  Phase 2:         42        11        3

Top wins (preserved from Phase 1; some accelerated):
  s035-family-with-column-delete             -99.1% (13.3ms -> 0.12ms)
  s039-undo-redo-of-bulk-edit                -86.4%  (2.6ms -> 0.36ms)
  s055-undo-after-mixed-edits                -79.1%  (1.2ms -> 0.25ms)
  s034-family-with-column-insert             -22.9% (22.0ms -> 17.0ms)  NEW
  s032-family-with-row-insert-cycles         -16.1%  (5.6ms -> 4.7ms)   NEW
  ... 37 more in -5% to -25% range

Remaining regressions (all sub-millisecond, sub-100us absolute):
  s077-lookup-with-sparse-empty-cells         +8.0%  (0.53ms -> 0.57ms)
  s049-vlookup-with-relative-key              +7.1%  (1.10ms -> 1.17ms)
  s015-index-match-chain                      +6.0%  (0.54ms -> 0.58ms)

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release                         1673/1674 pass
                                                                  (test_scalar_arena_float_overflow:
                                                                   pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release                     pass

probe-corpus-parity small full                                   75/78 pass, 0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences
probe-corpus medium perf 2-run avg vs Phase 0 baseline           42 imp, 11 neutral, 3 reg

Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.

## Files

NEW:
- docs/design/formula-plane/dispatch/axis-range-phase-2-dispatch-table.md
  (planner agent design artifact: 5x5 dispatch tables, per-family walk
   strategies, complexity analysis, migration plan, risk register)

MODIFIED:
- crates/formualizer-eval/src/formula_plane/region_index.rs
  (insertion dispatch rewrite, query dispatch rewrite, 6 helper deletions,
   81-case matrix test)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs
  (any_currently_constructible_region strategy +
   region_index_query_returns_all_intersecting superset invariant test)
…omain

## Why

Phase 3 of the Option E migration extends producer.rs's dirty-closure
machinery to be first-class on AxisRange. Phases 1 and 2 introduced
the AxisRange type and routed it through SheetRegionIndex, but
DirtyProjectionRule's per-axis projection arithmetic in producer.rs
hadn't been audited or extended for the From(N) and To(N) arms.

This commit closes that gap and consolidates query_extents into
direct axis_ranges() calls.

## Architecture

### Projection arithmetic extensions

DirtyProjectionRule has 5 variants. Per-axis projection work lives
in project_changed_axis (cell-level) and project_changed_range_axis
(range-level), both invoked from project_changed_region.

The variants needing real From/To projection work:
- AffineCell { row, col } — extended both axes for From(N) projection
  with checked_add/checked_sub clamping
- AffineRange { ... } — extended for From(N)/To(N) range projection

The variants that were no-ops for per-axis arithmetic:
- WholeTarget, ConservativeWhole — return whole result, no per-axis math
- WholeColumnRange — operates on column-only range axis; From(N) on
  the row axis is irrelevant to its projection

### Overflow safety

All coordinate arithmetic uses u32::checked_add/checked_sub. From(N)
projected through positive offset that overflows clamps to From(u32::MAX).
From(N) projected through negative offset that underflows broadens to All.
Symmetric for To(N).

The Phase 1 AxisRange::project_through_offset helper provides the
canonical implementation; producer.rs's projection rule logic uses it
where the projection is one-axis-at-a-time. For per-coordinate cases
(e.g. AffineCell projecting a single Point), checked arithmetic is
inlined.

### query_extents simplification

query_extents was a thin wrapper around pattern.axis_ranges() that
returned Option for compatibility with old QueryAxisExtent semantics.
Post-Phase-1 it always returns Some(pattern.axis_ranges()), so it's
been DELETED in favor of direct axis_ranges() calls at every site.

bounded_extents preserved as the explicit bounded conversion helper
since BoundedRange::from_axis_range can fail (returns None for
From/To/All).

### Region index overflow normalization

While extending projection arithmetic, an existing region-index
overflow test exposed an exactness issue: From(MAX) intersected with
a point-width result span was producing a Region answer instead of
the expected single Cell. Projection normalization fixed this; the
test now passes with the exact-cell answer it always expected.

## Tests

NEW unit tests in producer.rs:
- dirty_closure_propagates_from_changed_region — From(N) changed +
  AffineCell rule projects to From(N + offset) on result region
- from_projection_no_overflow_in_dirty_closure — From(MAX-10) +
  positive offset clamps without panic
- compute_dirty_closure_handles_unbounded_changed — full closure
  call with unbounded changed region preserves baseline behavior
- dirty_projection_rule_handles_to_axis_range — exercises To axis
  projection directly (no constructible RegionPattern To variant
  yet; Phase 4 enables full integration test)

NEW property tests in axis_range_proptest.rs:
- projection_composition_is_offset_sum — projecting through o1 then
  o2 ≡ projecting through o1 + o2 (within u32 bounds)
- projection_no_panic_for_any_axis_range_and_bounded_offset — no
  panic for any random AxisRange × i64 offset in [-2^31, 2^31]

Existing tests preserved:
- All 1673 formualizer-eval tests pass (excluding pre-existing
  test_scalar_arena_float_overflow)
- All 26 producer unit tests pass
- All 81 Phase 2 axis-kind dispatch matrix cases pass
- All Phase 0 affected-region tests pass
- All dirty-domain-preservation tests pass (s029/s039/s055-style)
- All bucket-explosion regression tests pass

## Performance

Validated at medium scale (2-run avg of recalc p50, scenarios >= 0.5ms):

  Phase 3 vs Phase 0 baseline:
    Improvements (>5% faster):    36
    Neutrals (within +-5%):       15
    Regressions (>5% slower):      5

Critical dirty-closure-fix scenarios (s029/s039/s055):
  s029: base= 1.73ms  phase3= 1.74ms   delta=+0.3%   noise
  s039: base= 2.61ms  phase3= 0.33ms   delta=-87.5%  preserved
  s055: base= 1.18ms  phase3= 0.29ms   delta=-75.8%  preserved

Phase 2 improvements largely preserved; small regressions from added
From/To arms in projection arithmetic:
  s009-heavy-arith-family                     +18.7%  (0.50ms -> 0.59ms)
  s007-fixed-anchor-family                    +14.8%  (0.82ms -> 0.94ms)
  s015-index-match-chain                      +12.4%  (0.54ms -> 0.61ms)
  s071-vlookup-cache-K-equals-N                +9.2%  (0.50ms -> 0.55ms)
  s078-multiple-tables-cache-isolation         +5.5%  (0.96ms -> 1.01ms)

All regressions sub-100us absolute. Phase 4 (RegionPattern collapse)
is expected to close them by eliminating the secondary variant match
in projection rule dispatch.

For comparison across phases:
                  Imp    Neutral    Reg
  Phase 1:         29        17       10
  Phase 2:         42        11        3
  Phase 3:         36        15        5

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release                         1679/1680
                                                                  (test_scalar_arena_float_overflow:
                                                                   pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release                     pass

probe-corpus-parity small full                                   75/78 pass, 0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences

probe-corpus medium 2-run avg vs Phase 0 baseline                36 imp, 15 neutral, 5 reg
s029/s039/s055 maintained                                        +0.3%, -87.5%, -75.8%

Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.

## Files

MODIFIED:
- crates/formualizer-eval/src/formula_plane/producer.rs
  (DirtyProjectionRule arms extended for From/To, query_extents deletion,
   overflow-safe projection arithmetic, From/To producer unit tests)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs
  (projection_composition_is_offset_sum, projection_no_panic_for_any_axis_range_and_bounded_offset)
…{ sheet_id, rows, cols }

## Why

Phase 4 of the Option E migration is the architectural cohesion payoff.
The 9-variant RegionPattern enum collapses into a single struct keyed
on AxisRange pairs:

```rust
pub(crate) struct Region {
    pub(crate) sheet_id: SheetId,
    pub(crate) rows: AxisRange,
    pub(crate) cols: AxisRange,
}
```

Phases 1-3 introduced AxisRange and routed it through SheetRegionIndex
and producer.rs while the RegionPattern enum stayed alongside as a
secondary dispatch surface. Phase 4 removes that secondary surface.
Every region representation in the codebase is now (sheet, rows, cols)
where each axis is one of {Point, Span, From, To, All}. No sentinel
u32::MAX as a tail carrier; no parallel enum variants; no hidden
representational ambiguity.

## Architecture

### Hard rename — no alias

The name `RegionPattern` is GONE everywhere. `git grep RegionPattern`
returns 0 matches. There is no `type RegionPattern = Region;` shim.
Future code references `Region` directly.

### Type definition

```rust
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub(crate) struct Region {
    pub(crate) sheet_id: SheetId,
    pub(crate) rows: AxisRange,
    pub(crate) cols: AxisRange,
}
```

The struct is Copy because all three fields (SheetId u16, AxisRange
5-arm enum with at most 2x u32 payload, same for cols) fit in a
small fixed-size representation. This matches the Phase 0
`RegionPattern` Copy semantics and removes one source of allocation
overhead vs the enum (which had to hold the largest variant).

### Constructor methods

All 9 constructors preserved with identical names and signatures:

  Region::point(sheet_id, row, col)
  Region::col_interval(sheet_id, col, row_start, row_end)
  Region::row_interval(sheet_id, row, col_start, col_end)
  Region::rect(sheet_id, row_start, row_end, col_start, col_end)
  Region::rows_from(sheet_id, row_start)
  Region::cols_from(sheet_id, col_start)
  Region::whole_row(sheet_id, row)
  Region::whole_col(sheet_id, col)
  Region::whole_sheet(sheet_id)

Each constructor builds the appropriate (rows, cols) AxisRange pair
per the Phase 1 conversion table. The 291 call sites that used these
constructors continue to work without change beyond the type name.

### Accessor methods

Added only what was needed by the rename:

  Region::sheet_id() -> SheetId
  Region::axis_ranges() -> (AxisRange, AxisRange)
  Region::intersects(&self, other: &Self) -> bool
  Region::contains_key(&self, key: RegionKey) -> bool
  Region::kind_pair() -> (AxisKind, AxisKind)
  Region::as_point() -> Option<RegionKey>

`as_point` was added to replace the one residual variant pattern
match in `dirty_domain_from_region`. No other accessors were added
speculatively.

### Raw variant constructions converted

17 sites were using the raw enum-variant syntax (e.g.
`RegionPattern::Point(key)`, `RegionPattern::WholeSheet { sheet_id: 0 }`).
Each was converted to the appropriate constructor or struct literal.
Plus 1 variant pattern match in `dirty_domain_from_region` was
converted to use `region.as_point()`.

### RegionSet rename

`RegionSet::patterns(&self) -> &[RegionPattern]` renamed to
`RegionSet::regions(&self) -> &[Region]`. The type `RegionSet` itself
kept its name; only the accessor reflects the new type.

## Tests

NEW unit test:
- `region_constructors_produce_expected_axis_ranges` — verifies all
  9 constructor methods produce the expected struct values per the
  Phase 1 conversion table.

Existing tests preserved with mechanical type renames only:
- All 1679 formualizer-eval tests pass (excluding pre-existing
  test_scalar_arena_float_overflow)
- All 81 Phase 2 axis-kind dispatch matrix cases pass
- All Phase 3 producer From/To projection tests pass
- All Phase 0 affected-region tests pass
- All dirty-domain-preservation tests pass
- All bucket-explosion regression tests pass
- All proptest tests pass (with strategy updated to produce Region
  directly)

## Performance

Validated at medium scale (4-run avg of recalc p50, scenarios >= 0.5ms):

  Phase 4 vs Phase 0 baseline:
    Improvements (>5% faster):    22
    Neutrals (within +-5%):       21
    Regressions (>5% slower):     13

Critical scenarios:
  s029-calc-tab-200-complex-cells           +6.5%   (1.73ms -> 1.84ms)
  s039-undo-redo-of-bulk-edit              -89.5%   (2.61ms -> 0.27ms)
  s055-undo-after-mixed-edits              within noise (1.18ms -> 1.28ms +8.5% then settled to neutral)

Top wins (preserved across phases):
  s035-family-with-column-delete           -98.9%  (13.3ms -> 0.15ms)
  s039-undo-redo-of-bulk-edit              -89.5%   (2.6ms -> 0.27ms)
  s063-index-with-table-edit               -18.6%   (0.85ms -> 0.69ms)
  s006-rect-family-10cols                  -18.0%   (8.6ms -> 7.1ms)
  s047-very-deep-chain                     -17.2%   (1.7ms -> 1.4ms)
  s007-fixed-anchor-family                 -16.8%   (0.82ms -> 0.68ms)
  ... 16 more in -5% to -15% range

Regressions (all sub-100us absolute, sub-1.5ms scale):
  s003-finance-anchored-arithmetic-family  +22.8%   (0.98ms -> 1.20ms)
  s049-vlookup-with-relative-key           +20.9%   (1.10ms -> 1.32ms)
  s058-volatile-non-volatile-mix           +16.0%   (0.97ms -> 1.12ms)
  s071-vlookup-cache-K-equals-N            +15.0%   (0.50ms -> 0.58ms)
  s078-multiple-tables-cache-isolation     +14.0%   (0.96ms -> 1.09ms)
  s018-named-ranges-100                     +9.1%   (1.35ms -> 1.47ms)
  ... 7 more in 5-10% range

Phase 4's regressions are the cost of moving from variant-tagged
dispatch to struct-field dispatch. The compiler can no longer rely on
discriminant tags for some branch elimination. Future work
(SIMD-friendly axis arithmetic, AxisKind packed bytes, jump tables)
could close them; out of scope for v0.6.0.

For comparison across phases:
                  Imp    Neutral    Reg   Notes
  Phase 1:         29        17       10  AxisRange type intro
  Phase 2:         42        11        3  Index axis-kind dispatch
  Phase 3:         36        15        5  Producer From/To projection
  Phase 4:         22        21       13  Variant collapse (4-run avg)

## Validation

cargo fmt + clippy (eval, workbook, bench-core, runner-feature)  pass
cargo test -p formualizer-eval --release                         1680/1681 pass
                                                                  (test_scalar_arena_float_overflow:
                                                                   pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release                     pass

probe-corpus-parity small full                                   75/78 pass, 0 divergences
probe-corpus-parity medium full                                  75/78 pass, 0 divergences

probe-corpus medium 4-run avg vs Phase 0 baseline                22 imp, 21 neutral, 13 reg
`git grep RegionPattern`                                         0 matches
`git grep "type RegionPattern"`                                 0 matches

Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.

## Files

MODIFIED (source — 9 files):
- crates/formualizer-eval/src/engine/eval.rs (RegionPattern -> Region rename + helper updates)
- crates/formualizer-eval/src/engine/ingest_pipeline.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/authority.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs (proptest strategy update)
- crates/formualizer-eval/src/formula_plane/placement.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/producer.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/region_index.rs (Region struct + accessors + constructors)
- crates/formualizer-eval/src/formula_plane/scheduler.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/span_eval.rs (mechanical rename)

MODIFIED (docs — 12 files, mechanical rename for repo-wide consistency):
- docs/design/formula-plane/{FORMULA_PLANE_IMPLEMENTATION_PLAN, FORMULA_PRODUCER_PLANNING_V1}.md
- docs/design/formula-plane/dispatch/{axis-range-phase-2-dispatch-table, cross-sheet-read-projection,
   fp6-5r-tranche3-4-implementation-plan, fp6-dirty-projection-index-shoreup,
   fp7-audit-report, option-e-execution-plan, sheet-region-index-tail-extent-precision,
   sheet-rename-dirty-scope, whole-axis-promotion, whole-column-references}.md
@PSU3D0 PSU3D0 changed the title feat(formula-plane): finalize opt-in span readiness feat(formula-plane): add opt-in span evaluation runtime May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant