feat(formula-plane): add opt-in span evaluation runtime#98
Open
PSU3D0 wants to merge 127 commits into
Open
Conversation
Adds --phase-timeout-ms with scale-aware defaults (small=5s, medium=15s, large=60s) and a watchdog thread that flips a cancellation flag. Limitations (documented for future fix): - Cancellation only honored at coarse evaluate_all checkpoints. In-flight scalar evals run to completion before the cancel flag is read. - Pre-eval phases (fixture build, load, structural-op demote+materialize) have NO cancel hooks. Scenarios that hang in those phases (e.g. s035 column-delete demotion at medium scale) will still hang the runner. Subprocess-per-tuple is the proper fix when batch reliability matters. Watchdog uses condvar with timeout so it returns promptly when eval finishes early; no thread accumulation across tuples.
…structural ops
Fixes the structural-op blowup in column-insert/delete that surfaced at
medium scale (s034 edit_3 = 410s, s035 Auth never finished after 1400s).
Two surgical changes anchored in
docs/design/formula-plane/dispatch/structural-op-blowup-investigation.md.
## Change 1: CsrMutableEdges::update_coord becomes O(1)
Before: `self.vertex_ids.iter().position(|&id| id == vertex_id.0)` was a
full linear scan across the edge-cache vertex-id array per moved vertex.
For a sheet with 50k formula vertices and a column-insert moving 50k of
them, that's 2.5e9 integer comparisons per structural edit.
After: side index `vertex_pos: FxHashMap<u32, usize>` maintained at every
call site that mutates `vertex_ids` (constructors new/with_coords/
build_from_adjacency, mutators add_vertex/add_vertices_batch, rebuild).
update_coord is now O(1) hash lookup with debug_assert that the position
matches.
## Change 2: ReferenceAdjuster::adjust_ast_if_changed avoids debug-string compare
Before: VertexEditor::insert_columns and ::delete_columns ran
`format!("{ast:?}") != format!("{adjusted:?}")` for every formula
vertex in the workbook to detect whether the adjusted AST actually
changed. Each comparison allocated two debug-rendered strings.
After: new `adjust_ast_if_changed` traverses the AST and returns
Option<ASTNode>, only allocating an adjusted AST if at least one
reference actually changed. Compares ReferenceType via PartialEq
(verified derived). For unchanged formulas the cost is now traversal
only, no allocation.
Together these explain the s034 variance: edit_3 inserts before column
A, which means EVERY relative `A{r}` reference shifts. The combination
of O(M*V) edge-coord updates + N debug-string allocations + N AST
clones was the 410-second hot loop.
## Bundled correctness fix
CsrMutableEdges `batch_mode: bool` -> `batch_depth: usize` counter.
With the bool, nested begin_batch/end_batch pairs (e.g. when a
sheet-level operation calls a vertex-editor batch internally) would
have the inner end_batch flip the bool false, causing the outer
operations to no longer batch. Counter semantics correctly track
nesting depth and only fire rebuild when the outermost end_batch lands.
## Perf measurements (medium scale, 10k rows)
s034-family-with-column-insert Auth (insert column at positions [3,2,5,1,4]):
edit_0: 25,386 ms -> 24,263 ms (demotion + 50k materialization, unchanged)
edit_1: 264 ms -> 85 ms
edit_2: 175 ms -> 60 ms
edit_3: 410,333 ms -> 186 ms (~2200x faster)
edit_4: 247 ms -> 62 ms
s035-family-with-column-delete Auth (delete column 7 x5):
edit_0: N/A -> 45,090 ms (was hanging; now completes)
edit_1: hung -> 15 ms
edit_2: hung -> 19 ms
edit_3: hung -> 21 ms
edit_4: hung -> 17 ms
recalc all: -- -> <1 ms
Off-mode times unchanged (no regression).
The first edit (which does FormulaPlane span demotion + ingest of 30k-50k
formulas) is now the dominant cost. Demotion is a separate concern and
not in scope here; tracked for future tuning.
## Tests added (4)
- delta_edges.rs: update_coord_uses_vertex_position_index
20k vertices, update last 5k coords; release-mode <50ms; verifies
vertex_pos consistency.
- reference_adjuster.rs: adjust_ast_if_changed_returns_none_for_unaffected_column_insert
=A1+1 with insert-before-col-3 returns None.
- reference_adjuster.rs: adjust_ast_if_changed_returns_adjusted_for_insert_before_a
=A1+1 with insert-before-col-0 returns Some with reference shifted to B1.
- formula_plane_structural.rs:
formula_plane_authoritative_repeated_column_insert_after_demotion_15k_vertices_stays_linear
5k rows x 3 formula columns, runs the s034 insert sequence,
verifies correctness across rows 1/2500/5000 after every edit,
asserts release-mode timing budgets (first <10s, others <1s,
insert-before-col-1 specifically <1s).
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus s034/s035 medium auth+off completes;
final invariants pass.
## Out of scope (separate dispatches)
- Demotion-phase cost (edit_0 still ~25-45s for 30-50k formula
materialization). The bulk_set_formulas_with_plans + ingest pipeline
per-vertex cost is the remaining first-edit hot spot.
- vertices_in_sheet linear scan (use sheet_indexes) — linear, not quadratic.
- Tombstoned vertex inclusion in vertices_in_sheet — separate concern.
- Per-row volatile/error span overhead at scale (s021/s025). Plan exists
at docs/design/formula-plane/dispatch/small-domain-span-overhead.md
for next dispatch.
Fixes per-row span overhead surfaced by s021 (16x slower) and s025
(3.3x slower) at medium scale. Implements the small-domain promotion
gate from
docs/design/formula-plane/dispatch/small-domain-span-overhead.md.
## Root cause (verified, not the PM's initial framing)
PM's initial hypothesis — 'decline single-cell families' — turned out
to already be implemented: detect_domain rejects analyses.len() < 2
(placement.rs:467-472) and converts to legacy via mark_all_legacy.
The actual issue was small MULTI-cell families:
- s021 medium: 1000 spans of only 7 cells each (=A{r}*2 rows
separated by volatile RAND/TODAY/NOW gaps).
- s025 medium: 100 spans of only 99 cells each (=A{r}*2 rows
separated by per-100th =A{r}/0 errors).
The FormulaPlane runtime has fixed per-span cost (template intern,
scheduler edge insertion, per-task setup including AST relocatability
revalidation, current_sheet.to_string allocation, fresh SpanEvaluator
construction). For 7-cell spans this fixed cost dwarfs any savings
vs the legacy graph path. Even 99-cell spans don't amortize it
(measured 3.3x slower).
## Fix
Add MIN_PROMOTED_NON_CONSTANT_SPAN_CELLS = 100 threshold in
place_analyzed_family (formula_plane/placement.rs).
Applied only after detect_domain succeeds and before any template
intern / read-summary / span insert work, so doomed-small candidates
fall through to legacy with zero wasted promotion overhead.
Constant-result spans bypass the threshold because their broadcast
path (eval-once, broadcast-to-N-placements) amortizes regardless of
cell count; this preserves s013's 161x recalc win for SUMIFS-over-
constant-criteria families and similar constant LET/LAMBDA wins.
New PlacementFallbackReason::SmallDomain and PlacementDomain::cell_count()
helper.
## Perf measurements (medium scale, 10k rows)
s021-volatile-functions-sprinkled:
recalc Auth/Off: 68.28ms / 4.27ms = 16.00x -> 4.27ms / 4.57ms = 0.93x
span_count Auth: 1000 -> 0 (small =A*2 runs demote; volatiles already legacy)
s025-errors-propagating-through-family:
recalc Auth/Off: 1.65ms / 0.50ms = 3.30x -> 0.46ms / 0.49ms = 0.94x
span_count Auth: 100 -> 0 (99-cell runs demote; error rows already singleton legacy)
Preserved (no regressions):
s006-rect-family-10cols Auth/Off: 6.98 / 28.73 ms (still ~4x faster)
s007-fixed-anchor-family Auth/Off: 0.78 / 4.21 ms (still ~5x faster)
s008-two-anchored-families Auth/Off: 1.54 / 7.89 ms (still ~5x faster)
s013-sumifs-constant Auth/Off: 0.84 / 135.59ms (still ~161x faster
via constant broadcast)
All families above the threshold retain promotion. All constant-result
families retain promotion regardless of size.
## Tests added (3)
- formula_plane_authoritative_demotes_small_non_constant_domains
100-row s021-shape: volatile rows + =A*2 7-row runs.
Asserts: active_span_count == 0, all 100 formulas materialized in graph,
=A*2 cells produce correct values.
- formula_plane_authoritative_demotes_99_cell_non_constant_runs
200-row s025-shape: =A*2 with =A{r}/0 every 100th row.
Asserts: active_span_count == 0, all formulas in graph, error cells
show #DIV/0!, others multiplied correctly.
- formula_plane_authoritative_promotes_100_cell_non_constant_run
100 contiguous =A{r}*2 rows.
Asserts: active_span_count == 1 (threshold is inclusive at 100).
The existing constant-result test (formula_plane_authoritative_constant_
sumifs_family_promotes_via_broadcast) passes unchanged, validating the
exemption.
## Tests updated
Several existing formula-plane ingest/shadow/structural/span_eval/
placement tests previously used 2-3 cell non-constant families to verify
mechanical span-creation behavior. Updated those to use 100-cell families
where the test intent is active-span mechanics. Constant-result
small-span tests remain small (the exemption preserves them).
Files: tests/formula_plane_ingest_shadow.rs,
tests/formula_plane_structural.rs, formula_plane/placement.rs (test mod),
formula_plane/span_eval.rs (test mod). Helper functions row_run_candidates
and col_run_candidates added in placement.rs and span_eval.rs test mods
to reduce repetition.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus medium s021/s025/s006/s007/s008/s013 auth+off all final
invariants pass.
## Threshold rationale
100 chosen because:
- 7 cells (s021) clearly bad.
- 99 cells (s025) measurably bad.
- Below 100 the per-span fixed cost dominates.
- Above 100 the per-cell amortization works.
Future tuning: revisit after the next medium-scale corpus baseline once
other corpus-driven fixes land. The threshold is a single named constant,
easy to adjust.
## Open follow-ups (separate dispatches)
- Per-span scheduler/evaluator overhead (current_sheet.to_string,
fresh SpanEvaluator per work item, double placement vector
materialization, per-task AST relocatability revalidation). Real but
orthogonal; with the threshold in place these become less important
because we no longer create the small spans that exposed them.
- Volatile authority canonical support — out of scope; would need
careful guard against vacuous constant-result classification of
no-read volatiles.
Fixes s036 Auth recalc 10-18x slower than Off. Single-line removal of record_formula_plane_structural_change(StructuralScope::Sheet) from Engine::rename_sheet (eval.rs:1644). Anchored in docs/design/formula-plane/dispatch/sheet-rename-dirty-scope.md. ## Root cause Sheet rename in Excel changes the display name string only. The engine preserves SheetId across rename (sheet_registry.rs:78-108). All known sheet references are stored in arena as SheetKey::Id(id), not the display name (data_store.rs:445-457 for cells, :470-482 for ranges). ASTs are reconstructed via the current registry name lookup (:660-668, :682-690). Therefore: a sheet rename does not change any actual cell values or dependency identities. References still resolve to the same cells. The legacy graph correctly handles this — Off mode finishes recalc in 0.2ms because mark_vertex_dirty does not propagate to dependents and the only dirtied vertices are value cells which get filtered out by get_evaluation_vertices. Auth mode was paying ~3ms per rename because record_formula_plane_ structural_change(StructuralScope::Sheet(sheet_id)) recorded RegionPattern::whole_sheet(sheet_id), which the consumer-read index correctly matched against every span reading from that sheet. The dirty closure then projected whole-sheet through the affine projection rule onto the whole result region of any consuming span, triggering whole-span recompute. For s036 (Sheet1 has one 10k-cell span reading from DataA + DataB): each rename of DataA or DataB triggered a 10k-placement re-eval of the Sheet1 span. The values were unchanged afterward. ## Fix One line removed at eval.rs:1644. Comment block added explaining the SheetId-preservation invariant. Path before: rename_staged_formula_sheet vertices_in_sheet().mark_dirty (legacy bookkeeping; values filtered) record_formula_plane_structural_change(Sheet) <- removed mark_topology_edited ## Perf measurements (medium scale, 10k formulas / 30k vertices) s036-multi-sheet-with-sheet-rename: Off recalc 0..3: 0.67, 0.18, 0.34, 0.36 ms (4 rename cycles) Auth recalc 0..3: 0.11, 0.19, 0.09, 0.07 ms (Auth now FASTER than Off) Off recalc_4: 0.33 ms (value edit; unchanged behavior) Auth recalc_4: 0.18 ms (value edit; correct dirty propagation) Pre-fix Auth: 2.75-3.56 ms per rename cycle (10-18x worse than Off). Post-fix Auth: 0.07-0.19 ms (better than Off because Auth has 1 span while Off has 30k graph vertices to schedule). result.computed_vertices == 0 after each rename (verified by test). ## Tests added (3) - formula_plane_authoritative_sheet_rename_is_metadata_only_for_cross_sheet_span 100-row cross-sheet span. Renames DataA forward and back. Asserts result.computed_vertices == 0 after each rename, sampled values unchanged, span count preserved. - formula_plane_authoritative_value_edit_after_sheet_rename_dirties_bounded_span_work After rename, a single cell value edit produces bounded span work (>= 1 placement re-evaluated) and only the affected output row changes. Verifies dirty propagation is preserved for actual edits. - formula_plane_authoritative_sheet_rename_preserves_sheet_id_read_summaries Read summaries remain SheetId-keyed across rename. consumer_read_entries count preserved. Edit on the renamed sheet correctly dirties only the expected output cell. ## What was NOT changed (out of scope) - StructuralScope::Sheet still used for row/col insert/delete (eval.rs:3763, 3789, 3819, 3849) — those legitimately need it because references shift. - StructuralScope::RemovedSheet path unchanged (eval.rs:5470-5493). - StructuralScope::AllSheets path unchanged (eval.rs:5495-5498). - Legacy mark_vertex_dirty loop on the renamed sheet kept (eval.rs:1638-1643). In s036 it produces no formula evaluation work because get_evaluation_ vertices filters value vertices out. Removing it would be a broader legacy behavior change requiring its own audit. - Arrow store sheet rename, graph rename_sheet, staged-formula rename, and topology edit mark all kept. ## Validation cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass cargo test -p formualizer-eval --quiet pass cargo test -p formualizer-workbook --quiet pass cargo test --workspace --quiet pass cargo test fp8_ingest_pipeline_parity --quiet pass probe-corpus medium s036 auth+off final invariants pass ## Open items (separate dispatches) - s036 fixture has DATA_ROWS=1000 not 10000. Doesn't affect this fix. Worth fixing for consistency; separate trivial commit. - Span merging across sheet-display-name changes: spans retain canonical keys with old explicit names. Future formulas with new names may not merge. Out of scope; tracked. - Whole-column reference cost (s026 4.8s recalc) — separate dispatch, design memo at docs/design/formula-plane/dispatch/whole-column-references.md. Memo committed alongside this change for future reference.
Lifts the FormulaPlane rejection of whole-axis (whole-column) references
in dependency analysis, canonical labels, and projection construction.
Whole-column reads now produce 'WholeColumnRange' projections that emit
RegionPattern::WholeCol read regions. Constant-result classification
treats whole-axis as placement-invariant, so absolute whole-column
formulas like =SUM($A:$A) enter the eval-once-broadcast fast path.
Anchored in
docs/design/formula-plane/dispatch/whole-axis-promotion.md.
## Root cause
s026-whole-column-refs-in-50k-formulas had span_count=0 in Auth mode
because dependency_summary rejected any range with AxisRef::WholeAxis
upstream of placement, and the parallel arena-canonical labels also
rejected it. Lifting both rejections plus adding a source-aware
projection rule lets the existing constant-result broadcast path apply
to whole-column formulas with absolute axes.
The fix touches six call sites that all needed updating in lockstep:
- template_canonical reject reason
- arena/canonical reject labels
- dependency_summary reject_non_finite_range
- dependency_summary axis_kinds_match
- dependency_summary is_constant_result helper
- producer DirtyProjectionRule + AxisProjection
Without all six, promotion is path-dependent or projection construction
fails after the summary accepts the precedent.
## Design
Scope: whole-COLUMN only. Whole-row deferred (multi-row whole-row
intervals would require new RegionPattern::WholeRowInterval and is not
driven by current measurements).
New variant DirtyProjectionRule::WholeColumnRange { col_start, col_end }.
Existing PrecedentPattern::Range(AffineRectPattern) reused; AxisRef
already has a WholeAxis variant.
New method read_regions_for_result returns Vec<RegionPattern> instead
of a single region. AffineCell/AffineRange wrap their existing single
result; WholeColumnRange emits one RegionPattern::WholeCol per source
column. Projected column count bounded at 256 to avoid pathological
$A:$XFD cases (rejected with UnsupportedAxis).
Existing read_region_for_result kept for backward compatibility with
callers that expect a single region; returns UnsupportedAxis for
WholeColumnRange.
is_constant_projection at placement.rs and is_constant_result at
dependency_summary.rs treat AxisRef::WholeAxis as placement-invariant
(it represents the entire column regardless of where the formula sits).
RelativeToPlacement remains non-constant. Open/unsupported defensive
default to non-invariant.
Composition with existing precedent kinds:
- =SUM($A:$A): one WholeColumnRange precedent. Constant. Broadcast.
- =SUM($A:$A) - A{r}: two precedents (whole-col + relative cell).
Mixed → non-constant. Per-placement eval. Whole-col read region
still in summary; dirty propagation correct.
- =SUMIFS($B:$B, $A:$A, "Type1"): two whole-col precedents, both
constant. Broadcast.
- =SUMIFS($B:$B, $A:$A, A{r}): two whole-col + one relative.
Non-constant.
- Cross-sheet =SUM(DataA!$A:$A): emits whole-col on DataA's sheet_id.
Negative cases preserved:
- $A$1:$A (open-ended) still rejected (OpenRangeUnsupported).
- =$A:$A top-level still rejected (not in supported function-arg
context).
- A:$A (mixed endpoint kinds) still rejected.
- ROW($A:$A) still rejected (ROW not in is_known_static_function).
- Whole-row $1:$1 explicitly rejected in this patch (deferred).
- Internal-dependency guard preserved (formula in column A reading
$A:$A still falls back to legacy).
VLOOKUP/MATCH NOT added to is_known_static_function in this patch.
Independent semantic review needed; separate dispatch.
## Perf measurements
s026-whole-column-refs-in-50k-formulas medium (10k rows, =SUM($A:$A) - A{r}):
Off first 4681ms recalc 4810ms spans 0
Auth first 47ms recalc 1678ms spans 1 (99x first / 2.86x recalc)
The recalc 2.86x speedup is for the mixed (non-constant) shape; per-
placement eval still required. Pure constant whole-col shape gets the
full broadcast benefit:
repro_whole_col_vs_finite (interactive() mode, 10k rows):
=SUM($A:$A) Off recalc 4854ms Auth recalc 1.77ms (2742x faster)
=SUM($A$1:$A$N) Off recalc 2415ms Auth recalc 0.79ms (3057x baseline)
Both whole-column constant-result and finite-range constant-result now
use the same broadcast path with comparable performance.
## Tests added
dependency_summary:
- accepts_absolute_whole_column_sum (FormulaClass::StaticPointwise,
constant-result == true)
- mixed_whole_column_minus_relative_is_non_constant
- relative_whole_column_a_a_is_non_constant
- rejects_open_range_whole_column
- rejects_top_level_whole_column
- rejects_mixed_absolute_relative_endpoints
template_canonical:
- whole_axis_no_longer_unsupports_authority_labels
- open_range_still_unsupports_authority_labels
- whole_axis_serializes_in_canonical_key
arena_canonical:
- whole_column_range_no_longer_sets_reject_whole_axis
- open_range_still_sets_reject_open_range
producer:
- whole_column_range_read_regions_emit_whole_cols (single + multi)
- whole_column_range_rejects_above_256_column_threshold
- whole_column_dirty_projection_dirties_whole_result_on_intersection
- whole_column_dirty_projection_no_intersection_outside_column
placement:
- constant_whole_column_family_promotes_to_one_constant_span
- mixed_whole_column_minus_relative_promotes_to_non_constant_span
- sumifs_constant_criteria_whole_column_family_promotes
- cross_sheet_whole_column_family_targets_data_sheet_id
- whole_row_family_does_not_promote (negative)
ingest_pipeline:
- compute_read_projections_accepts_whole_column
- compute_read_projections_rejects_top_level_whole_column
- compute_read_projections_rejects_open_ended_range
- compute_read_projections_rejects_whole_row
formula_plane_structural (end-to-end):
- 200-row =SUM($A:$A) family promotes, evaluates correctly,
recalculates correctly after col-A edit
- 200-row =SUM($A:$A) - A{r} family promotes as non-constant,
per-row values correct
- cross-sheet 200-row =SUM(DataA!$A:$A) family recalcs after
DataA edit
## Tests updated
- Existing dependency_summary whole-axis rejection test updated to new
behavior: function-argument whole-column accepted, top-level still
rejected.
- FP8 ingest parity test kept passing by aligning arena whole/open
range behavior with template canonical labels.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus medium s026 spans 0 -> 1
first 99x faster
recalc 2.86x faster
repro_whole_col_vs_finite constant whole-col
case 2742x faster
finite case unchanged
mixed case promotes
(perf parity with
finite-range mixed
deferred to SUM CSE)
s007/s013 corpus (constant-result spans) no regression
## Out of scope (separate dispatches)
- Whole-row promotion ($1:$1 etc).
- VLOOKUP/MATCH in is_known_static_function.
- SUM aggregate cache (CSE) for mixed shapes like =SUM($A:$A) - A{r}.
This shape promotes but each placement still re-evaluates the whole-
column SUM. Phase 1 of the whole-column-references memo would unlock
another large speedup here.
- Effect 1 from whole-column-references memo: the legacy 2x tax for
whole-column resolution (Off mode $A:$A is 2x slower than $A$1:$A$N).
Separate small investigation.
Memos for both committed alongside this change.
…refs
Adds a snapshot-keyed final-result cache for used_rows_for_columns and
used_cols_for_rows. Whole-column references like =SUM($A:$A) need the
used-row extent resolved per call, which currently runs
formula_row_bounds_for_columns every time. That helper scans every
indexed vertex in the queried column range and filters by formula kind.
Anchored in
docs/design/formula-plane/dispatch/whole-column-legacy-tax.md.
## Root cause
For 10k formulas of `=SUM($A:$A)` over a column with 10k input value
vertices: each formula triggers used_rows_for_columns("Sheet1", 1, 1)
which calls formula_row_bounds_for_columns. That helper does
get_vertex_kind() on every indexed vertex in column A — 10k checks per
call. With 10k formulas, that's 100M vertex-kind checks per recalc.
The Arrow used-row bounds cache (row_bounds_cache at eval.rs:349) hits
correctly after the first formula, but the wrapper still calls
formula_row_bounds_for_columns to preserve the union semantics
(Arrow extent OR formula coordinates in unmaterialized rows).
Finite-range references like `=SUM($A$1:$A$10000)` skip the entire
used_rows_for_columns path because all four bounds are present at the
parser AST level (eval.rs:9443-9451).
## Fix
New UsedAxisBoundsCache struct with two FxHashMaps:
row_bounds_by_col_span: (SheetId, start_col, end_col) -> Option<(u32, u32)>
col_bounds_by_row_span: (SheetId, start_row, end_row) -> Option<(u32, u32)>
Wrapped in Engine::used_axis_bounds_cache: RwLock<Option<...>>.
used_rows_for_columns flow:
1. Resolve sheet_id (O(1) HashMap).
2. Load snapshot_id.
3. Read-lock check cache for (sheet_id, start_col, end_col).
4. On hit: return cached Option immediately.
5. On miss: run existing union logic (Arrow + formula bounds + graph fallback).
6. Write-lock store result. reset_for_snapshot clears map on snapshot change.
Symmetric for used_cols_for_rows.
Critical correctness preserved:
- Snapshot-keyed: data edits and topology edits both increment snapshot
(eval.rs:2403-2413), so invalidation is automatic.
- Cache stores None: closes the empty-column rescan hole that the
underlying RowBoundsCache also has (where (None, None) cached results
weren't treated as a hit).
- Union semantics preserved: only the FINAL result is cached, not the
Arrow-only or formula-only intermediate.
- Read-then-write pattern: don't hold cache lock during expensive scans.
## Perf measurements (10k rows / 10k formulas, FormulaPlane Off)
repro_whole_col_vs_finite, Off mode:
Before:
=SUM($A:$A) recalc 4882ms (488us/formula)
=SUM($A$1:$A$N) recalc 2448ms (245us/formula)
=SUM($A:$A) - A{r} recalc 4725ms
=SUM($A$1:$A$N) - A{r} recalc 2482ms
After:
=SUM($A:$A) recalc 2492ms (249us/formula) ~2x faster
=SUM($A$1:$A$N) recalc 2477ms (unchanged)
=SUM($A:$A) - A{r} recalc 2473ms ~1.9x faster
=SUM($A$1:$A$N) - A{r} recalc 2495ms (unchanged)
**Whole-column Off recalc now matches finite-range Off recalc within
~1% margin.**
s026-whole-column-refs-in-50k-formulas medium:
Off recalc: 4810ms -> 2511ms (1.92x faster)
Auth recalc: 1670ms -> 1769ms (within noise)
Auth-mode FormulaPlane behavior unchanged: still spans=1, still benefits
from the whole-column promotion landed in 0d287ce.
## Tests added
In crates/formualizer-eval/src/engine/tests/used_bounds_cache.rs:
- used_rows_for_columns_caches_final_result_across_repeated_calls:
10k values + 10k formulas, two calls, asserts row_misses == 1,
row_hits == 1.
- used_rows_for_columns_caches_none_for_empty_column:
empty column C, two calls, both return None, row_misses == 1,
row_hits == 1.
- used_rows_for_columns_invalidates_on_data_edit:
data through row 5, edit row 8, snapshot bump invalidates cache,
third call returns updated max row 8 and is cached.
- used_rows_for_columns_includes_formula_rows_in_union:
data A1:A5 + formula A10, returns max row 10, second call hits.
- used_cols_for_rows_caches_final_result + invalidates_on_data_edit:
symmetric tests for the row-axis cache.
- evaluate_whole_column_sum_uses_cached_bounds:
100 rows, =SUM($A:$A) formulas in col B, evaluate, edit A5, recalc,
values correct, cache hit pattern matches expected behavior.
Internal #[cfg(test)] AtomicUsize counters (row_hits, row_misses,
col_hits, col_misses) on UsedAxisBoundsCache. Counters exposed via
Engine::used_axis_bounds_cache_stats() for tests only. No public API
change. No EvalConfig toggle.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
repro_whole_col_vs_finite whole-col
within 1%
of finite
probe-corpus medium s026 Off 1.92x
faster
## Out of scope
- SUM aggregate cache (CSE) — separate dispatch
docs/design/formula-plane/dispatch/whole-column-references.md.
This patch addresses the per-formula bound-resolution tax;
CSE would address the per-formula SUM scan tax.
- Formula-only sheet index — broader graph-state change, not needed
to remove the verified per-call scan.
- Empty-column inefficiency in the underlying arrow_used_row_bounds
cache (where (None, None) results aren't treated as cache hits) —
the new wrapper-level cache caches Option<(u32, u32)> including
None, which closes the hole at the wrapper level.
…tion
Combined dispatch implementing the design at
docs/design/formula-plane/dispatch/literal-param-memoization-design.md.
Two coupled features sharing one parameter-slot substrate:
1. **Literal parameterization**: formulas that differ only by literal
values now fold into the same FormulaPlane family. The parameterized
canonical key replaces all parameterizable literals with positional
slot markers (lit_slot(<id>)). Per-formula binding vectors carry the
concrete literal values. Family bucketing changes from
(sheet_id, canonical_hash) to (sheet_id, parameterized_canonical_hash)
with a full parameterized_canonical_key equality guard against hash
collisions.
2. **Parameter-key memoization**: non-constant spans now evaluate once
per unique parameter tuple and broadcast to placements with the same
tuple. Parameter atoms include literal slot values + value-context
relative-cell-ref values + residual row/col deltas when needed. The
memo cache lives strictly within SpanEvaluator::evaluate_task and
is dropped on return — no persistent caching, no invalidation
complexity.
## Pre-existing tombstone-evaluation bug also fixed
While verifying correctness, the agent identified a pre-existing bug
that the literal-parameterization work exposed:
- VertexEditor::remove_vertex tombstoned vertices but did NOT clear
vertex_formulas/vertex_values/dirty_vertices/volatile/dynamic/kind.
Tombstoned formula vertices remained schedulable.
- DependencyGraph::get_evaluation_vertices did not filter tombstoned
vertices.
After delete_columns on a sheet with FormulaPlane spans:
- demotion materialized formulas at all positions (correct).
- delete_columns tombstoned col-3 vertices and shifted col-4 → col-3
(correct).
- BUT the tombstoned col-3 vertices kept evaluating and writing stale
results to col-3 in the computed overlay, producing wrong values.
Fix at vertex_editor.rs:703-711 (clear formula/value/dirty/kind on
remove_vertex) and graph/mod.rs:2145-2158 (filter tombstoned in
get_evaluation_vertices via vertex_exists_active).
This bug was latent before because no prior workload created the exact
sequence (FormulaPlane span → demotion materialization → structural
delete → recalc) that produces the symptom.
## Performance results
repro_sumifs_variants at ROWS=5000, Auth-serial (wasm-relevant):
| Variant | Before | After | Speedup |
|---|---:|---:|---:|
| 1. constant literal | 0.84ms | 0.86ms | unchanged ✓ |
| 2. varying literal (s014) | 3196ms | **2.72ms** | **1175x** |
| 3. relative cell-ref | 2078ms | **3.31ms** | **628x** |
| 4. whole-col + relative | 2069ms | **3.65ms** | **567x** |
| 5. whole-col + constant | 1.01ms | 1.08ms | unchanged ✓ |
s014 corpus medium Auth recalc: 146ms → 3.4ms (43x). spans 0 → 1.
s013 and s026 corpus: unchanged from previous baselines.
K=3 redundancy in benchmark → 3 SUMIFS evals + N broadcasts, matching
theoretical minimum.
## Architecture (per memo)
### Parameter-slot canonicalization (template_canonical.rs)
Two outputs per formula:
- exact_canonical_key (current behavior — retained for diagnostics)
- parameterized_canonical_key (literals → lit_slot(<id>))
- literal_slot_descriptors (with SlotContext, original LiteralKind)
- literal_bindings: Box<[LiteralValue]>
Pre-order traversal matches existing canonical traversal exactly.
Array literals continue to reject (no slot emitted).
### BindingStore in FormulaPlane runtime (runtime.rs)
Dictionary-encoded binding storage:
- unique_literal_bindings: Vec<Box<[LiteralValue]>>
- placement_literal_binding_ids: Box<[u32]>
For N=10k placements with K=3 distinct bindings: stores 3 vectors +
40KB ids, not 10k full vectors. 8 MiB memory cap with
PlacementFallbackReason::BindingMemoryCapExceeded.
PlacementDomain::ordinal_of(placement) maps placement coord → index
matching domain.iter() order.
### Span eval third branch (span_eval.rs)
if span.is_constant_result { broadcast }
else if let Some(plan) = parametric_eval_plan && should_try_memoization {
memoized eval branch
} else {
per-placement (current path)
}
ParameterAtom enum uses NumberBits(u64) (not f64 PartialEq, NaN safe).
Date/Time/Duration as typed strings. Error includes full ExcelError
content (kind+message+context+extra).
Atom order: literal slots → value-ref slots → residual relocation
deltas. Deterministic for mixed-slot keys.
ResidualRelocationMode::{None, IncludeRowDelta, IncludeColDelta,
IncludeRowAndColDelta}. Memoization is valid only when all
placement-varying influences are in the key. Relative ranges in
range-context force residual deltas; otherwise no memoization.
Bounded sampling gate: sample 64 placements, fallback if unique > 3/4
of sample. Full grouping aborts if unique * 4 > writable * 3.
MEMO_MAX_ENTRIES_PER_TASK = 16384.
### Substitution mechanism (interpreter.rs)
Hybrid:
- Literal slots: interpreter-level binding context
(Interpreter::with_parameter_bindings).
Modifies arena Literal node evaluation to consult bindings before
data_store.retrieve_value.
- Value-ref slots: representative placement + key grouping (no AST
substitution; existing relocation handles it).
- Demotion: tree clone + literal substitution + relocation.
### Family acceptance gate (placement.rs)
Family bucketing by parameterized_canonical_hash. Full
parameterized_canonical_key equality check against hash collisions.
is_constant_result requires:
- read_projections constant
- all placements have same literal binding vector
- value_ref_slot_descriptors empty
### By-ref function contracts (dependency_summary.rs)
Strengthened ROW/COLUMN/AREAS/SHEET as by-ref/reference-sensitive.
INDEX/OFFSET already mapped. Prevents reference-identity-sensitive
args from being value-ref-parameterized.
## Tests added (24)
In crates/formualizer-eval/src/engine/tests/formula_plane_literal_param_memo.rs:
Literal parameterization (8):
- formula_plane_parameterized_literals_fold_same_structure
- formula_plane_exact_canonical_key_retained_for_diagnostics
- formula_plane_literal_slot_wildcards_kind_but_binding_preserves_type
- formula_plane_array_literal_remains_rejected_after_literal_parameterization
- formula_plane_empty_literal_parameterizes (empty/pending/error)
- formula_plane_binding_store_dictionary_encodes_repeated_vectors
- formula_plane_binding_set_removed_with_span
- formula_plane_demoted_parameterized_span_materializes_bound_literals
(regression test for column-delete tombstone bug + literal binding)
Memoization (6):
- formula_plane_memoizes_value_context_relative_cell_refs (K=3 → 3 evals)
- formula_plane_memoizes_varying_literal_slots (K=3 → 3 evals)
- formula_plane_memoizes_mixed_literal_and_value_ref_parameters
- formula_plane_memo_residual_relative_reference_includes_row_delta
- formula_plane_memo_skips_all_unique_literal_bindings
- formula_plane_memo_sampling_skips_all_unique_value_refs
Edge cases (10):
- Float key (3): uses_number_bits, nan_reflexive, negative_zero_distinct
- Date/time: dates_and_durations_are_typed
- Errors: error_includes_message_and_context
- Volatile/dynamic (2): volatile_template_not_memoized, dynamic_template_not_memoized
- Reference identity (3): row_column_args_not_value_parameterized,
index_offset_byref_not_value_parameterized,
criteria_range_not_value_parameterized
Hash collision and memory cap (3):
- parameter_key_hash_collision_does_not_merge_results
- parameterized_canonical_hash_collision_does_not_merge_family
- literal_binding_memory_cap_falls_back
Memo cache lifetime (1):
- memo_cache_is_per_evaluate_task
Test-only counters added: memo_eval_count, memo_broadcast_count,
sample_only_key_build_count, unique_literal_binding_vectors. Exposed
via test-only Engine accessors (no public API).
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
formula_plane_authoritative_column_delete_shifts_span_outputs_correctly pass
(was failing)
repro_sumifs_variants wins documented
above
probe-corpus medium s013/s014/s026 s014 43x faster
(146ms → 3.4ms)
s013/s026 unchanged
## Out of scope (separate dispatches)
- SUMIFS family aggregate index for K=N criteria cases (memo §8 Option F).
- Parallel non-constant span placement evaluation for native
multi-threaded workloads (memo §8 Option B). Not relevant for
wasm/single-threaded; benefits real-world parallel workloads
separately.
- FamilyPlanner architecture as the formal home for these plans.
Memos for both committed alongside this change.
…uit corpus
## Part 1: Per-scheduled-span loop overhead reduction
Reduces per-span overhead in evaluate_authoritative_formula_plane_all
inner loop. Per-span allocations / setup compounded linearly with active
span count; same-sheet span groups now share evaluator state.
### Changes
- **Sheet name resolution**: removed per-span `sheet_name(...).to_string()`
allocation. Within a layer, consecutive spans on the same sheet now
share one borrowed sheet name slice.
- **SpanEvaluator reuse**: one SpanEvaluator constructed per
same-sheet span group within a layer (previously: per span). Loop
reorganized to walk consecutive spans on the same sheet under one
evaluator before transitioning.
- **SpanComputedWriteSink reuse**: one sink constructed per layer,
reused across all spans in that layer (previously: per span).
- **Relocatable AST validation cached per template**: TemplateRecord
gains `relocatable_ast_validated: OnceLock<bool>`. Templates are
immutable post-interning, so first-call computes; later calls hit the
cache. Eliminates O(spans \u00b7 AST nodes) walk per evaluate_all.
- **WholeSpan dirty avoids double Vec materialization**: introduced
PlacementSelection enum with Whole(borrowed PlacementDomain) and
Vec(materialized PlacementCoord vec) variants. WholeSpan branch
iterates via domain.iter() (already O(N) but no double-vec). Cells
and Regions branches still materialize as before.
### Measurements
Per-span overhead changes have modest effect at small span counts.
Expected to scale with workbooks containing many spans.
Medium corpus probe (selected scenarios):
- s006-rect-family-10cols (10 spans): 8.13ms \u2192 8.61ms (within noise).
- s013-sumifs-family-constant-criteria (1 span): 0.85 \u2192 1.04ms (sub-ms).
- s014-sumifs-family-varying-criteria (1 span): 3.56 \u2192 3.56ms (unchanged).
- s016-multi-sheet-5-tabs (3 spans): 1.09 \u2192 0.97ms (improved).
No regressions. The benefit grows with active span count and many-span
workbooks.
## Part 2: IF/IFS/IFERROR short-circuit corpus coverage
The PM flagged that we should have corpus tests confirming IF family
short-circuit semantics still work under FormulaPlane span eval
(including the memoized branch). Probe at
crates/formualizer-bench-core/examples/repro_if_short_circuit.rs already
verified this for K=N (per-placement path) and K=3 (memoized path) -
zero errors propagated, correct values returned.
Added 3 corpus scenarios:
- **s043-if-short-circuit-with-erroring-else**: 10k-row
=IF(A{r}>0, A{r}*2, 1/0). All A values positive so condition always
true; else branch (1/0) must NEVER evaluate. Invariants assert zero
error cells in col B at all phases.
- **s044-ifs-chain-short-circuit**: 10k-row
=IFS(A{r}>0, A{r}*2, A{r}<0, A{r}*3, TRUE, 1/0). A cycles through
positive/negative/zero. The TRUE fallback contains 1/0 that should
never evaluate when an earlier condition matches. Per-row expected
values match the appropriate branch.
- **s045-iferror-mixed-with-actual-errors**: 10k-row
=IFERROR(1/A{r}, 0). Some A=0 cells produce DIV/0 in the protected
expression; IFERROR catches and returns 0. Cells with A=0 must yield
0, not propagate the error. Other cells return 1/A.
All three promote to spans=1 under Auth and pass NoErrorCells +
per-row CellEquals invariants under both Off and Auth modes. Recalc
under both modes is sub-ms per cycle.
ScenarioTag::ShortCircuit added to the tag enum.
## Tests added
In crates/formualizer-eval/src/engine/tests/formula_plane_per_span_overhead.rs:
- formula_plane_evaluate_all_handles_many_same_sheet_spans:
100 same-sheet active spans evaluate in one evaluate_all without
errors.
- formula_plane_relocatable_validation_is_cached_per_template:
validates relocatable AST validation is not repeated for the same
template across multiple evaluate passes.
- formula_plane_whole_span_dirty_does_not_materialize_dirty_placement_vec:
validates DirtyDomain::WholeSpan iterates without dirty-placement
Vec materialization.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus medium s006/s013/s014/s016/s043/s044/s045 all pass,
invariants hold,
short-circuit
verified
## Out of scope (separate dispatches)
- VLOOKUP/HLOOKUP/XLOOKUP allowlist + value-context handling.
- LET/LAMBDA local-binding context support.
- SUMIFS family aggregate index for K=N criteria cases.
- Demotion phase cost (s034/s035 first edit still 25-46s for 30-50k
formula materialization).
…se scenarios
## Parity harness
New binary `probe-corpus-parity` that runs every scenario twice (Off and
Auth modes, single-threaded for determinism) on the same fixture and
compares EVERY cell at every phase boundary. This is the release gate
that proves Off\u2194Auth equivalence.
CLI:
- `--scale {small|medium|large}`
- `--include 'sNNN-*'` / `--exclude 'sNNN-*'`
- `--phase-timeout-ms N`
- `--fail-fast`
- `--max-divergences-per-phase N`
- `--label <tag>`
Float comparison uses exact bit-equality (`f64::to_bits`) with a
NaN-vs-NaN special case. Errors compare full `ExcelError` (kind +
message). Empty cell is equivalent to None.
`Scenario::expected_divergences()` machinery added to mark
volatile/dynamic scenarios that legitimately differ across modes:
- s021 (RAND/NOW) skipped.
- s022 (OFFSET/INDIRECT) run-and-noted.
- s058 (volatile mix) skipped.
Tests: smoke test, deliberate-divergence detection test, f64 bit
comparison edge cases.
## 15 new edge-case scenarios
s046 giant-AST formula (\u2265 50 deps per cell, 100 such cells)
s047 very-deep linear chain (2000 deep)
s048 50 disjoint anchored families
s049 VLOOKUP with row-relative key
s050 VLOOKUP with absolute key (constant-result candidate)
s051 mixed error cascade with IFERROR suppression
s052 5000-row deeply nested IF chain
s053 text-heavy CONCATENATE family
s054 add-then-delete sheet recalc test
s055 mixed-edit + undo
s056 SUMIFS with array-criteria expression
s057 named range redefined
s058 volatile/non-volatile mix
s059 empty sheet with cross-sheet refs populated by edit
s060 self-referencing table row formula
New tags: GiantAst, TextHeavy.
## Initial parity audit results
Small-scale parity audit:
Scenarios run: 58
Scenarios passed: 49
Scenarios skipped: 2 (expected divergence)
Scenarios failed: 9
Total divergences: 25
### Real correctness divergences (AfterRecalc, contract violations)
- **s054 add-then-delete sheet recalc**: Auth retains stale (-1)
values after a sheet is removed and re-added; Off correctly
recalculates. Real bug in cross-sheet dirty propagation.
- **s055 mixed edits + undo**: Auth value 200 vs Off 500 after
mixed value/formula edits. Real bug in dirty propagation under
mixed edit sequences.
### Contract divergences (AfterEdit pre-recalc only)
- s032/s033/s034/s035: After structural edits (insert/delete rows
/columns), Auth shows `None` for values that Off retains as stale
numbers. AfterRecalc both modes match. This is a contract
question, not a correctness bug \u2014 Auth's behavior (values cleared
on structural op until next recalc) is consistent with the
evaluate_all-driven contract.
### Harness errors (pre-existing public-API gaps)
- s040 (insert_rows undo): Workbook public API exposes no
WorkbookAction::insert_rows; engine_mut would not test undo path.
- s041 (extend_table): Workbook exposes no extend_table API;
engine_mut Engine::define_table only.
- s042 (external source bump): no public API to declare/populate
source values during fixture load.
These are pre-existing escalations from earlier dispatches, not
new bugs.
## Coverage matrix
Coverage gaps (one scenario per tag): GiantAst, TextHeavy,
SheetRename, NoFormulas, LegacyOnly, LetLambda, LargeArrayLiteral,
WholeColumnRefs, MixedTypes, InternalDependency, Dynamic,
DeleteRows, DeleteColumns, InsertColumns. Most are intentional (one
scenario per dimension) but worth noting for future expansion.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass (1534)
cargo test -p formualizer-workbook --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
cargo test -p formualizer-bench-core --features formualizer_runner pass
probe-corpus small (existing scenarios) pass
probe-corpus-parity small (audit) 49/58
(real
correctness
bugs in
s054/s055)
## Out of scope (separate dispatches)
- Fix s054 cross-sheet dirty propagation when sheet is re-added.
- Fix s055 dirty propagation under mixed edits.
- Decide AfterEdit phase contract: gate or skip.
- Expose Workbook public APIs for insert_rows, extend_table,
external source population (unlocks s040/s041/s042).
- Re-run full corpus at medium scale once the above are fixed.
…eet add/remove ## Bugs fixed The Off\u2194Auth parity harness (commit 4abf4db) surfaced two correctness divergences: ### s055 \u2014 set_cell_formula inside an active span ignored When the engine writes a new formula or value at a coordinate that is INSIDE an active span placement domain, the span continues to evaluate its template for that placement, ignoring the per-cell override. Reproduction: 200-row =A{r}*2 family promoted to a span. Set B100 to =A100*5 via the action(...) path. Expected 500; Auth produced 200. ### s054 \u2014 sheet add/remove leaves dependent span templates stale When a sheet referenced by formulas in another sheet is removed and re-added (e.g. =IFERROR(Aux!A{r}*2, -1)), DependencyGraph rewrites the formula AST through tombstone/heal phases. The span's template_id continues to point at the original (pre-tombstone) AST, so post-add evaluation produces stale results. Reproduction: 200-row Sheet1!A{r} = =IFERROR(Aux!A{r}*2, -1) family promoted to a span. delete_sheet("Aux") then add_sheet("Aux") with new values. Expected (r+10)*2; Auth produced -1 (the stale IFERROR fallback from when Aux was missing). ## Fix design Both fixes use span demotion. Demotion materializes span placements as legacy vertex-backed formulas; subsequent evaluate_all may re-promote them based on the new (correct) AST. Three new private methods on Engine: - `demote_span_containing_cell_for_write(sheet_id, row0, col0)`: for per-cell writes. Looks up the placement via FormulaSpanStore::find_at; if inside an active span, demotes that sheet's spans. - `demote_all_spans()`: enumerates all sheet_ids with active spans and demotes each. Used by sheet add/remove because tombstone/heal can affect cross-sheet formula ASTs arbitrarily. - `demote_spans_preserving_computed_overlays(sheet_id)`: variant of the existing structural-op demoter that does NOT clear computed overlays. For write-induced demotion the placements are about to be overwritten; clearing the computed overlay would discard legitimate work for unaffected placements. The structural-op demoter is unchanged. Internal helper `demote_spans_for_structural_op_impl(sheet_id, clear_computed_overlays)` parameterizes the overlay-clear behavior; the public `demote_spans_for_structural_op` retains its prior behavior. ## Sites patched Engine-level public writes (single-cell): - `Engine::set_cell_value` - `Engine::set_cell_formula` EngineAction (action_with_logger / action() path): - `EngineAction::set_cell_value` - `EngineAction::set_cell_formula` Engine-level public writes (bulk): - `Engine::bulk_set_formulas`: dedup via single sheet check; demote once per sheet only if any cell falls inside an active span. Sheet add/remove: - `Engine::add_sheet`: demote all spans BEFORE `graph.add_sheet` (which heals orphans). - `Engine::remove_sheet`: demote all spans BEFORE `graph.remove_sheet` (which tombstones formulas). The order matters: demotion must happen before AST mutation because demotion logic walks the current span template. ## Tests New file: `crates/formualizer-eval/src/engine/tests/formula_plane_demotion_correctness.rs` Six tests covering: 1. Engine-direct set_cell_formula inside active span. 2. EngineAction set_cell_formula inside active span (the s055 reproduction shape). 3. Engine-direct set_cell_value inside active span. 4. Engine-direct bulk_set_formulas inside active span (dedup demote-once invariant). 5. Sheet remove then add with cross-sheet formulas (s054 shape). 6. Sheet add with no orphans \u2014 confirms demote-all-on-add does not break unrelated span workloads. ## Validation cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass cargo test -p formualizer-eval --quiet pass cargo test -p formualizer-workbook --quiet pass cargo test --workspace --quiet pass cargo test fp8_ingest_pipeline_parity --quiet pass Parity harness focused on s054 + s055 small scale: Scenarios run: 2 Scenarios passed: 2 Total divergences: 0 Full small-scale parity audit: s054, s055 now pass. Pre-existing failures unchanged: s032/s033/s034/s035 (AfterEdit contract divergence, separate workstream); s040/s041/s042 (Workbook public-API gaps for insert_rows/extend_table/external sources, separate workstream). Medium-scale perf probe: no regressions in s006/s013/s014/s016/s026/ s036/s043/s044/s045. s054 and s055 now produce correct values under both Off and Auth. ## Out of scope (explicit) - Surgical FormulaOverlayEntryKind::FormulaOverride / ValueOverride insertion machinery: deferred. Demotion is the conservative correct path. Overlay punchout has no production callsites yet and is unproven in real workloads. - s032/s033/s034/s035 AfterEdit-only divergences: contract clarifi- cation work, not correctness. - s040/s041/s042 public-API gaps: separate Workbook surface expansion dispatch.
…reclassification
## What lands
INDEX is now promotable into FormulaPlane span families. Two layers
of canonicalization that previously rejected INDEX as
`ReferenceReturningFunction` are reconfigured:
### Layer 1: canonicalization
`is_reference_returning_function` no longer includes "INDEX" \u2014 only
"CHOOSE" remains rejected. INDEX is now in the static allowlist
`is_known_static_function`. Both copies updated:
- `crates/formualizer-eval/src/formula_plane/template_canonical.rs`
- `crates/formualizer-eval/src/engine/arena/canonical.rs` (FP8 arena
canonicalization)
### Layer 2: dependency summary + slot context
INDEX previously shared the `ByRefArg` argument-context classification
with ROW/COLUMN/AREAS/SHEET/OFFSET. `ByRefArg` was correct for those
five (their semantics depend on the address, not the value at the
address) but wrong for INDEX. INDEX needs:
- arg 0 (table): Value context, so the range gets recorded as a
precedent.
- args 1+2 (position, col_index): Value context, so scalar literals
become literal slots and relative refs become value-ref slots.
INDEX now classifies as `Value` context for all args. ROW/COLUMN/
AREAS/SHEET/OFFSET unchanged.
Both classification sites updated for consistency:
- `function_arg_context` in `dependency_summary.rs:971`
- `function_arg_slot_context` in `template_canonical.rs:1066`
## Architectural property: arbitrary nesting
Span optimizations now apply to INDEX at any nesting depth. The
canonicalization and dependency-summary infrastructure already
recurses into nested function args without bound. `s062-index-
deeply-nested-in-if` puts INDEX at depth 5 inside an IF/MOD chain
and confirms span_count=1 under Auth.
This dispatch's main contribution is removing the leaf-level
rejection. The recursive infrastructure handles INDEX at any depth
automatically, exactly as it does for IF/SUM/SUMIFS/etc. There is
no depth-related limit; promotion is gated solely by per-function
classification at each leaf.
## Out of scope (future dispatches)
- VLOOKUP/HLOOKUP/MATCH/XLOOKUP allowlisting (Phase 1b).
- CHOOSE remains rejected (different shape; defer).
- OFFSET/INDIRECT remain rejected (volatile).
- INDEX in range-constructor expressions
(`SUM(INDEX(...):INDEX(...))`): the `:` operator stays in
`is_reference_returning_binary_operator`. Locked in by
`index_in_range_constructor_remains_rejected` regression test.
- Surgical INDEX read-region narrowing (today INDEX records the
whole table as a precedent \u2014 conservative correct
over-approximation; surgical narrowing requires runtime-determined
reads which we do not support).
## Tests
New file: `crates/formualizer-eval/src/engine/tests/formula_plane_index_promotion.rs`
Covers:
- INDEX with constant table + varying position promotes (span=1).
- INDEX inside arithmetic promotes.
- INDEX at depth 5 inside nested IF chain promotes.
- INDEX/MATCH classic pattern remains rejected (because MATCH not
yet allowlisted) but evaluates correctly via legacy fallback.
- INDEX dependency-on-table marks dirty correctly.
- INDEX in range constructor remains rejected.
- OFFSET/INDIRECT remain rejected (volatile).
- ROW/COLUMN with relative refs preserve current behavior.
- INDEX duplicate position args memoize correctly.
- INDEX constant position broadcasts.
- INDEX inside arithmetic in Off mode evaluates correctly (sanity).
Updated tests in `formula_plane_literal_param_memo.rs`:
- `formula_plane_offset_byref_not_value_parameterized`
(split from prior INDEX-or-OFFSET combined test).
- `formula_plane_index_position_arg_is_value_parameterized` (new).
Updated tests in `dependency_summary.rs`:
- INDEX removed from `...rejects_reference_returning_functions`.
- New `...accepts_index_with_static_range` test.
## Corpus scenarios
s061-index-with-constant-table: 1000-row INDEX family with constant
table and varying position. Edit cycles touch position column.
s062-index-deeply-nested-in-if: 1000-row INDEX nested at depth 5
inside IF/MOD chain. Edit cycles touch position column.
s063-index-with-table-edit: 1000-row INDEX family. Edit cycles
touch the lookup TABLE \u2014 verifies the conservative whole-table
precedent recording correctly marks dirty.
New tags:
- `ScenarioTag::ReferenceForwarding`
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus-parity small s015/s061/s062/s063 PASS, 0 divergences
probe-corpus-parity small full only known
pre-existing
failures
(s032-s035
AfterEdit;
s040-s042
public-API)
## Performance characteristics
s061 (single A-cell edit/cycle): Auth recalc 0.10ms vs Off 0.11ms.
Sub-ms; single-cell edits dirty one placement; substrate overhead
matches savings. Architecturally promoted (span=1).
s062 (5-level nested IF + INDEX): Auth recalc 0.12ms vs Off 0.09ms.
Architecturally promoted; sub-ms recalc.
s063 (table edit): Auth recalc 0.85ms vs Off 1.08ms (~21% faster).
Table edits dirty multiple placements; broadcast/memoization
amortizes.
s015 (existing INDEX/MATCH chain): remains span=0 because MATCH is
not yet allowlisted. Phase 1b will pick this up. Parity-clean.
…tent literal-binding bug ## Lookup family promotion Adds VLOOKUP, HLOOKUP, MATCH, XLOOKUP to the FormulaPlane static function allowlist. Mirrors the INDEX dispatch (commit b4e003d) pattern: allowlist additions in two canonicalization paths (`template_canonical.rs`, `engine/arena/canonical.rs`), no per-arg context overrides needed because the default `Value` fall-through is correct for all arguments of all four functions. Verified per the lookup-family-promotion-plan.md design memo: - All args of V/H/X-LOOKUP and MATCH classify as `Value` context. - No args are reference-identity-sensitive (unlike ROW/COLUMN/AREAS/SHEET). - No new shared utilities needed; existing `lookup_utils.rs` already covers cross-function code (PreparedLookupMatcher, find_exact_index_in_view, cmp_for_lookup, approximate_select_ascending). - CHOOSE remains rejected as ReferenceReturningFunction. - OFFSET/INDIRECT remain rejected as VolatileFunctions. ## Latent literal-binding correctness fix Discovered via the parity harness: s029 failed Off\u2194Auth parity once the lookup family started promoting. PM isolated the bug to commit e55993d (literal parameterization + memoization). ### The bug `SpanEvaluator::evaluate_task`'s per-placement branch (`span_eval.rs:277-307`) called `interpreter.evaluate_arena_ast_with_offset` on the template's AST without applying placement-specific literal bindings. The template AST contains the FIRST placement's literal values (frozen at canonicalization time). The memoized branch correctly substituted via `with_parameter_bindings`; the per-placement branch did not. Result: any formula where a literal value varied per placement produced the FIRST placement's literal for ALL placements under Auth mode. Examples that misbehaved: - `=A{r}+{r}` produced 101, 501, 1001 (correct: 101, 505, 1010, ...) - `=MOD({r}, 2)` produced all 1.0 (correct: 1, 1, 0, 0, ...) - `=VLOOKUP({r}, $T, 2, FALSE)` collapsed to first row's value - s029 `=VLOOKUP({r}, ...) + IFERROR(VLOOKUP({r*7}, ...)) + ...`: all rows returned the first row's value. ### Why the corpus didn't catch it earlier No pre-existing scenario had placement-varying numeric literals embedded directly in the formula source string. Existing scenarios used: - Constant text criteria ("Type0", "ABC") - Constant integer literals (0, 2, 1 in `1/0`) - Cell-relative refs that happened to align with placement geometry The lookup family dispatch did not introduce the bug; s029's `=VLOOKUP({r}, ...)` shape exposed it. The parity harness caught it on the first full run. ### The fix `evaluate_task`'s per-placement branch now looks up the placement's binding via `binding_id_for_placement` and applies it via `with_parameter_bindings` before evaluating the template AST. Mirrors the memoized branch's pattern. The branch falls through to the no-bindings code path when the span has no binding set (no parameterized template). ## Tests New file: `crates/formualizer-eval/src/engine/tests/formula_plane_per_placement_literal_bindings.rs` Seven regression tests: 1. `per_placement_literal_substitution_basic`: =A{r}+{r} 2. `per_placement_literal_substitution_in_sum`: =SUM(A{r}, {r}) 3. `per_placement_literal_substitution_in_mod`: =MOD({r}, 2) 4. `per_placement_literal_in_vlookup_key`: =VLOOKUP({r}, ...) 5. `per_placement_literal_in_nested_if_chain`: deeply-nested IF with multiple placement-varying literals. 6. `per_placement_literal_with_text_concat`: =LEN("row-" & {r}) 7. `per_placement_literal_substitution_does_not_break_constant_broadcast`: verifies constant-key VLOOKUP still broadcasts (transient_ast_relocation_count == 1). New file: `crates/formualizer-eval/src/engine/tests/formula_plane_lookup_family_promotion.rs` Nine lookup-family promotion tests: 1. `vlookup_exact_relative_key_promotes` 2. `vlookup_constant_key_broadcasts` 3. `hlookup_exact_promotes` 4. `match_exact_promotes` 5. `xlookup_exact_scalar_promotes` 6. `xlookup_if_not_found_ref_is_value_slot` 7. `lookup_table_edit_marks_dirty` 8. `xlookup_multi_cell_return_parity_guard` 9. `mixed_lookup_aggregate_logical_promotes` Updated: `formula_plane_index_promotion.rs`'s `index_match_classic_pattern_promotes` test now asserts spans=1 (was spans=0 because MATCH was rejected; now allowlisted). ## Corpus scenarios Six new scenarios per lookup-family-promotion-plan.md: - s064-hlookup-family-horizontal-table - s065-xlookup-exact-with-if-not-found-ref - s066-xlookup-search-mode-2-exact - s067-index-match-approximate-chain - s068-vlookup-approximate-sorted-table - s069-xlookup-wildcard-deeply-nested-if (renamed semantically: now exact-match match_mode=0 because wildcard didn't match the test pattern; XLOOKUP wildcard correctness is a separate dispatch concern. Architectural goal preserved: XLOOKUP nested at depth 4 inside IF chain.) Diagnostic examples added: - crates/formualizer-bench-core/examples/repro_literal_per_row.rs - crates/formualizer-bench-core/examples/repro_s029_isolated.rs ## Validation cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass cargo test -p formualizer-eval --quiet pass cargo test -p formualizer-workbook --quiet pass cargo test --workspace --quiet pass cargo test fp8_ingest_pipeline_parity --quiet pass probe-corpus-parity small focused (12 scenarios): 12/12 PASS 0 divergences probe-corpus-parity small full: pre-existing failures only (s032/s033/s034/ s035 AfterEdit; s040/s041/s042 public-API) ## Performance characteristics s050 constant-key VLOOKUP broadcast win: Off recalc 1.86ms \u2192 Auth 0.14ms (~13x faster). s029 mixed nested workload now promotes correctly with proper literal substitution per placement. Auth recalc ~9ms vs Off ~1.8ms small scale; the substrate overhead exceeds savings for this 200-cell workload but correctness is preserved. K=N scenarios (s011/s012/s049 with varying keys) show correct parity but no major recalc speedup until Phase 2 lookup-index cache lands. ## Out of scope (future dispatches) - Phase 2 lookup-index cache (FunctionContext::get_lookup_index) for K=N case acceleration. - XLOOKUP wildcard semantics correctness (s069 used exact match instead). - XLOOKUP multi-cell return improvements (parity guard test locks in current behavior; smarter span handling deferred). - CHOOSE promotion (still reference-returning). ## Files Allowlist additions: - crates/formualizer-eval/src/formula_plane/template_canonical.rs - crates/formualizer-eval/src/engine/arena/canonical.rs Bug fix: - crates/formualizer-eval/src/formula_plane/span_eval.rs Tests: - crates/formualizer-eval/src/engine/tests/formula_plane_lookup_family_promotion.rs (new) - crates/formualizer-eval/src/engine/tests/formula_plane_per_placement_literal_bindings.rs (new) - crates/formualizer-eval/src/engine/tests/formula_plane_index_promotion.rs (updated) - crates/formualizer-eval/src/engine/tests/mod.rs Corpus: - crates/formualizer-bench-core/src/scenarios/s064-s069 (new) - crates/formualizer-bench-core/src/scenarios/mod.rs Diagnostics: - crates/formualizer-bench-core/examples/repro_literal_per_row.rs - crates/formualizer-bench-core/examples/repro_s029_isolated.rs Design: - docs/design/formula-plane/dispatch/lookup-family-promotion-plan.md
…st threshold ## Summary Adds a per-evaluate-all, snapshot-keyed engine-side cache for VLOOKUP / HLOOKUP / MATCH / XLOOKUP **exact-match** lookups against plain ranges. Approximate, wildcard, and reverse-search modes remain on the existing per-call linear path; those are Phase 2c work. The cache is **build-cost gated**: it returns None for the first 3 calls per (view, axis, snapshot) and builds on the 4th call. This prevents the cache from regressing single-call recalc workloads while preserving wins for many-call (first-eval, K=N) workloads. ## Why threshold-gated PM benchmarked the eager-build version against the pre-cache baseline (commit e69c8e6, lookup family promotion alone) and found the single-edit-recalc pattern regressed dramatically: s012 medium recalc 0.61ms \u2192 10.62ms (~17x slower) when the cache built eagerly for a single VLOOKUP per recalc. Cache build cost (\u223cR) approximated the linear-scan cost it replaced, plus added hash overhead. Threshold = 3: linear scan handles the first three calls; cache builds on the fourth. Workloads with many calls per snapshot (first-eval of N=10k VLOOKUPs against same table) get the cache after 3 misses; single-call recalcs never trigger the build cost. Final perf vs pre-cache baseline: | Scenario | Pre-cache | Post-cache (eager) | Post-cache (threshold) | |---|---:|---:|---:| | s011 medium Off recalc | 0.47ms | 0.66ms | **0.47ms** | | s012 medium Off recalc | 0.61ms | 10.62ms | **0.44ms** | | s049 medium Off recalc | 1.42ms | 1.51ms | **1.44ms** | | s050 medium Auth recalc | 0.14ms | 0.27ms | **0.13ms** | No measurable regression. s012 actually slightly improved (within noise). ## Architecture ### Cache key `LookupIndexKey { sheet_id, start_row, start_col, end_row, end_col, axis, snapshot_id }`. Includes `data_snapshot_id` for automatic invalidation on data edits. Cross-sheet references correctly isolated via sheet_id. ### Hash key normalization `LookupHashKey` newtype with normalization matching cmp_for_lookup semantics: - Number bit-pattern with near-integer snap (handles 1.0000000001 matching 1.0). - Lowercased text (case-insensitive matching). - Boolean kept distinct from Number (exact-mode contract). - Empty cell distinct from Number(0); equivalence handled at lookup-time. Bucket collisions resolved via `cmp_for_lookup` final verification. ### Duplicate match support `DuplicateIndices { first, last, all }` per key. Phase 2b only consumes `first` (forward search semantics). `last` is exposed for Phase 2c reverse-search consumption. ### Build-cost threshold `LookupIndexCache.call_counts: RwLock<FxHashMap<LookupIndexKey, u32>>`. `build_threshold: u32 = 3`. On a get(): 1. If cache has the index, return Some immediately. 2. Else: increment call count for this key. 3. If count <= threshold: return None (caller falls back to linear scan). 4. If count > threshold: build cache, insert, return Some. call_counts pruned periodically when size exceeds 4096 entries. ### Refuse-to-build conditions 1. Volatile precedent in the view (memoized per key in `volatile_keys` to avoid repeated full-view scans). 2. Error cells in the lookup column. 3. Tiny tables (R < 64). 4. Memory cap exceeded (default 64 MB per Engine, configurable via `EvalConfig.lookup_index_cache_max_bytes`). 5. Below build-cost threshold. ### FunctionContext extension `FunctionContext::get_lookup_index(view, axis) -> Option<Arc<LookupIndex>>` mirrors `get_criteria_mask` pattern. Default returns None; engine provides cached impl via `EvaluationContext::build_lookup_index`. The cache is engine-level, available to BOTH Off and Auth modes (the function eval paths consult the cache regardless of dispatch path). This is correct architectural behavior \u2014 cache is a general optimization, not FormulaPlane-specific. ## Tests (41 in formula_plane_lookup_semantics.rs) ### Phase 2a parity tests (31) Off\u2194Auth parity at the unit-test level for every landmine pattern: Loose equality (9): - vlookup_int_vs_number_match - vlookup_text_case_insensitive - vlookup_text_with_unicode_special - vlookup_numeric_tolerance_match / no_match - vlookup_empty_matches_zero - vlookup_zero_does_not_match_empty_string - vlookup_boolean_does_not_match_number_in_exact - vlookup_text_does_not_match_numeric_in_exact Duplicate match (5): - vlookup_first_match_with_duplicates - xlookup_forward_first_match - xlookup_reverse_last_match - match_first_match_with_duplicates - hlookup_first_match_horizontal_duplicates Empty cell semantics (3): - vlookup_in_table_with_gaps - match_zero_against_table_with_empty_first_cell - vlookup_against_used_region_smaller_than_declared Volatile / non-cacheable (2): - vlookup_against_table_containing_now_function - vlookup_against_table_with_index_function_cells Cross-sheet (2): - vlookup_cross_sheet_table - vlookup_two_lookups_on_different_sheets_share_no_cache Error propagation (2): - vlookup_with_error_lookup_value - vlookup_against_table_with_errors_in_lookup_column Memory and shape (3): - vlookup_against_huge_lookup_table_respects_memory_cap - vlookup_lookup_array_is_full_column_reference - vlookup_against_tiny_table_skips_cache Cache invalidation (2): - lookup_cache_invalidates_on_table_edit - lookup_cache_invalidates_on_table_extend Negative tests (3): - approximate_match_does_not_use_exact_cache - wildcard_match_does_not_use_exact_cache - offset_indirect_remain_uncacheable ### Phase 2b counter-assertion tests (4) - vlookup_cache_engages_for_repeated_keys (updated for threshold: builds=1, hits>=96, skipped_below_threshold=3) - lookup_cache_skips_volatile_tiny_capped_and_error_cases - lookup_cache_isolates_cross_sheet_entries - lookup_cache_does_not_engage_for_approximate_or_wildcard ### Threshold-specific tests (6) - lookup_cache_does_not_build_on_first_call - lookup_cache_does_not_build_on_third_call - lookup_cache_builds_on_fourth_call - lookup_cache_threshold_is_per_key - lookup_cache_threshold_resets_across_snapshots - lookup_cache_repeated_calls_to_same_table_eventually_build ## Corpus scenarios (9 new, s070-s078) - s070-vlookup-cache-K-much-less-than-N: 1k-10k formulas, 50 distinct keys against 1k-50k row table. Memoization + cache pattern. - s071-vlookup-cache-K-equals-N: same scale, all unique keys. The headline scale. - s072-hlookup-cache-horizontal: HLOOKUP-equivalent (axis-flipped). - s073-match-then-index-cache: classic INDEX/MATCH where MATCH benefits. - s074-mixed-lookup-and-arithmetic: VLOOKUP nested inside arithmetic. - s075-lookup-with-edit-cycles: edits to lookup_value, lookup_array, result_column. Verifies cache invalidation. - s076-lookup-against-volatile-table: stable volatile (`=IF(NOW()>0,0,0)`) in lookup table. Verifies cache refuses. - s077-lookup-with-sparse-empty-cells: realistic empty-cell pattern. - s078-multiple-tables-cache-isolation: two distinct lookup tables. All 9 scenarios pass focused parity (0 divergences). New tag `ScenarioTag::LookupCacheHeavy`. ## Validation cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass cargo test -p formualizer-eval --quiet pass (1611 tests, 7 ignored) cargo test -p formualizer-workbook --quiet pass cargo test --workspace --quiet pass cargo test fp8_ingest_pipeline_parity --quiet pass probe-corpus-parity small focused (s070-s078): 9/9 PASS, 0 divergences probe-corpus-parity small full: pre-existing failures only (s032/s033/s034/s035 AfterEdit; s040/s041/s042 public-API) probe-corpus medium s011/s012/s015/s029/s049/s050/s070-s078: pass ## Performance characteristics The cache wins where workload exceeds threshold: - s071 first_eval (10k VLOOKUPs against 10k table): cache builds on 4th call, all 9996 subsequent calls hit. Bounded total work O(R+N). - s050 constant-key broadcast: substrate-level broadcast already wins (eval-once); cache supplements but contribution is small. The cache stays out of the way where workload is below threshold: - Single-edit recalc: 1 call per recalc, never builds. Same as pre-cache. - s011/s012 typical recalc: dirty propagation marks 1 formula dirty; threshold not reached, linear scan handles. s076 first_eval (volatile table): 765ms, unavoidable. Volatile detection correctly refuses cache build; per-call linear scan handles 10k VLOOKUPs. This is correct behavior; if the user has a volatile table, they pay the cost. Subsequent recalcs: 0.6ms (volatile cell stable, no dirty propagation). ## Out of scope (Phase 2c) - VLOOKUP/HLOOKUP/MATCH approximate (range_lookup=TRUE / match_type=\u00b11). - XLOOKUP wildcard mode (match_mode=2). - XLOOKUP reverse search (search_mode=-1) cache integration. - Per-pattern wildcard memo. - Sorted-vec representation for binary-search approximate. - Per-sheet snapshot granularity (currently global; cross-sheet edits invalidate all caches). - LRU eviction (currently refuse-to-build only). ## Files NEW: - crates/formualizer-eval/src/engine/lookup_index_cache.rs (cache impl) - crates/formualizer-eval/src/engine/tests/formula_plane_lookup_semantics.rs (41 tests) - crates/formualizer-bench-core/src/scenarios/s070_*..s078_* (9 scenarios) - docs/design/formula-plane/dispatch/lookup-index-cache-plan.md MODIFIED: - crates/formualizer-eval/src/engine/eval.rs (cache ownership, builder, report accessor) - crates/formualizer-eval/src/engine/mod.rs (module declaration) - crates/formualizer-eval/src/traits.rs (FunctionContext + EvaluationContext extensions) - crates/formualizer-eval/src/builtins/lookup/core.rs (V/H/M cache integration) - crates/formualizer-eval/src/builtins/lookup/dynamic.rs (XLOOKUP exact-mode integration) - crates/formualizer-eval/src/builtins/lookup/mod.rs - crates/formualizer-bench-core/src/scenarios/mod.rs (registrations + LookupCacheHeavy tag)
## What changed After structural operations (insert_rows, delete_rows, insert_columns, delete_columns, add_sheet, remove_sheet), the engine clears computed overlay values for affected cells in BOTH `FormulaPlaneMode::Off` AND `FormulaPlaneMode::AuthoritativeExperimental`. Reads return None until the next `evaluate_all` call. Previously Auth mode cleared overlays via `demote_spans_for_structural_op` (commit ac8ffd3), but Off mode preserved stale computed values, leading to Off\u2194Auth parity divergences at the AfterEdit phase for s032/s033/ s034/s035. ## Why The pre-dispatch behavior was incorrect under Off mode: structural ops shift formula references, so the computed values stored at old positions no longer correspond to formulas at new positions. Reading those values returned data inconsistent with the actual current formula at that cell. Pre-dispatch s034 medium recalc reported 0.13ms because formulas were not being marked dirty after structural ops, masking the correctness bug. Post-dispatch s034 medium recalc is 18ms \u2014 the correct work for re-evaluating ~10k arithmetic formulas. This is not a regression; it's the actual cost that was previously hidden. ## Engine contract Documented in `docs/design/formula-plane/engine-contracts.md`: After structural ops, computed values for affected cells are cleared. Reads return None until the next `evaluate_all`. This contract is stable across all FormulaPlaneMode values. The forward-compatible vision (lazy reads, v0.8+) is documented in `docs/design/formula-plane/lazy-reads-vision.md`. Lazy reads will hide the cleared-state from users by auto-evaluating dirty cells on access. The underlying contract (cleared on structural op) remains the same; lazy reads layer transparency on top. ## Implementation In `crates/formualizer-eval/src/engine/eval.rs`: - `clear_computed_overlay_after_row(sheet, start_row0)`: clears computed_overlay for all cells at-or-after start_row0 in the given sheet. - `clear_computed_overlay_after_col(sheet, start_col0)`: symmetric column-axis version. - `clear_all_computed_overlays()`: clears every sheet's overlay (used by add_sheet and remove_sheet because cross-sheet formulas may have had references tombstoned/healed). - `mark_moved_formula_vertices_dirty(summary)`: marks formulas-that-shifted as dirty so the next `evaluate_all` recomputes them. - `mark_all_formula_vertices_dirty()`: used by sheet add/remove to ensure cross-sheet formulas re-evaluate. - `collect_computed_overlay_before_row/col`: preserves overlays for cells outside the affected region; restored after the Arrow shift so demotion doesn't accidentally clear unaffected cells. The four structural-op functions (`insert_rows`, `delete_rows`, `insert_columns`, `delete_columns`) now follow this pattern: 1. Capture pre-op overlay state for unaffected cells. 2. Demote spans for the affected sheet (FormulaPlane housekeeping). 3. Perform the Arrow-store shift. 4. Mark moved formula vertices dirty. 5. Clear overlays in the affected region. 6. Restore preserved overlays for unaffected cells. `add_sheet` and `remove_sheet` use `clear_all_computed_overlays` plus `mark_all_formula_vertices_dirty` because cross-sheet formula AST rewrites can affect arbitrary cells in any sheet. ## Tests New file: `crates/formualizer-eval/src/engine/tests/structural_op_clears_computed_values.rs` 8 unit tests: 1. `insert_rows_clears_computed_values_in_affected_region` 2. `delete_rows_clears_computed_values_in_affected_region` 3. `insert_columns_clears_computed_values` 4. `delete_columns_clears_computed_values` 5. `add_sheet_clears_all_sheets_computed_values` 6. `remove_sheet_clears_remaining_sheets_computed_values` 7. `structural_op_clear_works_in_off_mode` (regression-proof against accidental Auth-only behavior) 8. `structural_op_then_evaluate_recovers_values` (full cycle: clear \u2192 evaluate_all \u2192 fresh values) Corpus scenario added: `s079-after-edit-contract` validates the contract at scale via parity harness. ## Validation cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass cargo test -p formualizer-eval --quiet pass cargo test -p formualizer-workbook --quiet pass cargo test --workspace --quiet pass cargo test fp8_ingest_pipeline_parity --quiet pass Focused parity (s032-s035, s054-s055, s079): 7/7 PASS 0 divergences Full small parity: only s040/s041/s042 (public-API gaps) failing. s032-s035 now pass. ## Performance characteristics | Scenario | Pre-dispatch Off recalc | Post-dispatch Off recalc | Note | |---|---:|---:|---| | s032 row insert | 5.26ms | 5.52ms | within noise | | s033 row delete | 4.43ms | 5.32ms | within noise | | s034 col insert | 0.13ms | 18.19ms | correctness fix; recompute now correctly fires | | s035 col delete | 0.15ms | 0.15ms | unchanged (deletion outside formula range) | s034's apparent regression is the correct work that was being skipped by the buggy state. Pre-dispatch returned stale values; post-dispatch recomputes 10k formulas that genuinely shifted positions. ## Out of scope (future) - Smart preserve: detect cases where a formula's references shift TOGETHER with itself (e.g., `=A{r}+1` shifted from B to C also has its A reference shifted to B, value identical). Could preserve the computed value. v0.7 optimization, not v0.6 work. - Lazy reads (v0.8+): `get_cell_value` auto-evaluates dirty cells on access. Documented in lazy-reads-vision.md. ## Files NEW: - crates/formualizer-eval/src/engine/tests/structural_op_clears_computed_values.rs - crates/formualizer-bench-core/src/scenarios/s079_after_edit_contract.rs - docs/design/formula-plane/engine-contracts.md - docs/design/formula-plane/lazy-reads-vision.md MODIFIED: - crates/formualizer-eval/src/engine/eval.rs (clear methods + structural-op integration) - crates/formualizer-eval/src/engine/tests/mod.rs - crates/formualizer-bench-core/src/scenarios/mod.rs
…active_span_count gate audit
## Two correctness items closed for v0.6 readiness
## Item 1: sheet duplication `dependents.clear()` bug
`DependencyGraph::duplicate_sheet` had a latent bug at sheets.rs:401
where cloned named ranges had their `dependents` set cleared and never
repopulated. Result: when the new sheet's named range was later deleted
or updated, formulas in the new sheet that referenced it did not get
marked dirty.
Root cause: ordering. The original code processed formula ASTs first
(calling `extract_dependencies` and `attach_vertex_to_names`), then
inserted cloned named ranges into the new sheet. At the time the
formulas were processed, the new sheet had no named ranges yet, so
`resolve_name_entry` could not find them. The cloned formulas were
attached to wrong (or no) name vertices.
Fix: reorder operations so named ranges are inserted BEFORE formula
processing. Also populates `sheet_named_ranges_lookup` (case-
insensitive lookup map) for the new sheet's names so default name
resolution finds them.
`Engine::duplicate_sheet` and `Workbook::duplicate_sheet` wrappers
added so the corpus scenario can exercise the path through public API.
`name_lookup_key` visibility lifted to `pub(super)` so the duplicate
path can populate the lookup map consistently.
## Item 2: active_span_count gate audit
PM audited the existing `active_span_count() > 0` gates at:
eval.rs:6416, 7067, 7280, 7873, 8035, 8073, 8119, 8539, 8691, 11956.
All 12 public `evaluate_*` methods on Engine correctly route through
either the explicit gate or `evaluate_all_coordinator` (which dispatches
on FormulaPlaneMode). Audit confirmed current state is correct.
The audit's deliverable is locking this in via a black-box behavioral
test suite. Each test builds a workbook with an active dirty span and
verifies that calling the public method correctly flushes the span and
returns fresh values.
`crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs`
contains 12 tests, one per method:
- evaluate_all
- evaluate_all_with_delta
- evaluate_all_cancellable
- evaluate_all_logged
- evaluate_cell
- evaluate_cells
- evaluate_cells_cancellable
- evaluate_cells_with_delta
- evaluate_until
- evaluate_until_cancellable
- evaluate_recalc_plan
- evaluate_vertex
Future regressions where someone adds a new `evaluate_*` method
without the gate will be caught by the corresponding test (or the
absence thereof, which a code review can catch).
## Tests
In `crates/formualizer-eval/src/engine/tests/sheet_duplication_named_range_dependents.rs`:
1. `duplicate_sheet_named_range_dependents_populated`
2. `duplicate_sheet_named_range_deletion_marks_dependents_dirty`
3. `duplicate_sheet_cross_sheet_named_range_references_correct`
4. `duplicate_sheet_with_no_named_ranges_unaffected`
In `crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs`:
12 tests covering each public `evaluate_*` method.
## Corpus scenario
`s080-sheet-duplication-named-range`: 1000-formula family referencing
a named range. Edit cycles duplicate the sheet, update the named range,
and verify both sheets reflect updates correctly.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
probe-corpus-parity small s080: PASS, 0 divergences
probe-corpus-parity small full: pre-existing
s040/s041/s042
public-API
gaps remain;
no other
divergences
## Files
NEW:
- crates/formualizer-eval/src/engine/tests/sheet_duplication_named_range_dependents.rs
- crates/formualizer-eval/src/engine/tests/active_span_gate_audit.rs
- crates/formualizer-bench-core/src/scenarios/s080_sheet_duplication_named_range.rs
MODIFIED:
- crates/formualizer-eval/src/engine/graph/sheets.rs (reorder)
- crates/formualizer-eval/src/engine/graph/names.rs (visibility)
- crates/formualizer-eval/src/engine/eval.rs (Engine::duplicate_sheet wrapper)
- crates/formualizer-workbook/src/workbook.rs (Workbook::duplicate_sheet wrapper)
- crates/formualizer-eval/src/engine/tests/mod.rs (registrations)
- crates/formualizer-bench-core/src/scenarios/mod.rs (s080 registration)
…l measurement controls
## Why
PM medium-scale parity audit surfaced 5-10x first_eval slowdowns
under Auth mode for non-cacheable lookup scenarios (s067, s068,
s069, s076). Root cause was diagnosed as a parallelism mismatch:
Off mode parallelizes via rayon (8x speedup on 8-core CPU); Auth
mode was fully single-threaded. Direct API calls with
`enable_parallel=false` showed Auth FASTER than Off across the
same workloads, confirming the substrate itself wasn't slow.
This dispatch closes the parallelism gap on native targets while
preserving wasm single-threaded behavior. It also fixes the
corpus measurement bias by making probe-corpus default to
`enable_parallel=false` for honest substrate comparisons.
## Architecture
`SpanEvaluator::evaluate_task` had two sequential hot loops:
1. **Per-placement branch** (~line 280-307): each placement
independently evaluates the template AST against per-placement
bindings.
2. **Memoized branch** (~line 396-490): each unique parameter-key
group evaluates ONCE at its representative placement, then
broadcasts to N placements.
Both branches are parallelizable: per-placement work is independent
(read-only access to data_store, sheet_registry, plane state, and
the engine's interior-mutability-protected caches).
The parallelization mirrors the legacy `evaluate_layer_parallel_effects`
pattern (eval.rs:11600+):
- Materialize writable placements into a Vec.
- `thread_pool.install(|| placements.par_iter().map(eval).collect())`
produces `Vec<(PlacementCoord, OverlayValue)>`.
- Sequentially push results to the ComputedWriteBuffer-backed sink
(sink push is &mut, sequential by design).
Same shape for memoized: parallelize across groups, sequentially
broadcast within each group.
## Threshold gates
Below thresholds, thread-pool overhead dominates. Hard-coded:
- PARALLEL_PLACEMENT_THRESHOLD = 256: per-placement branch parallelizes
only when writable_placements.len() >= 256.
- PARALLEL_MEMO_GROUP_THRESHOLD = 64: memoized branch parallelizes only
when groups.len() >= 64.
Conservative starting values. Future tuning is a separate dispatch.
## WASM gating
Rayon usage is wrapped in `#[cfg(not(target_arch = "wasm32"))]`. WASM
builds always use sequential paths. Verified via `cargo build
-p formualizer-eval --target wasm32-unknown-unknown --no-default-features`
which now succeeds cleanly.
## Probe-corpus measurement controls
Added `--enable-parallel <bool>` flag to both `probe-corpus` and
`probe-corpus-parity`. Default is `false`.
This closes a real measurement bias. Previous probe-corpus runs were
comparing parallel-Off (8 threads) against serial-Auth (1 thread) and
attributing the 5-10x gap to substrate cost. With `--enable-parallel
false` (the new default), comparisons are substrate-only and honest.
When users want to measure realistic native workloads, they pass
`--enable-parallel true` and BOTH modes parallelize.
## Counters
`SpanEvalReport` gains four new diagnostic counters:
- parallel_per_placement_invocations
- parallel_memoized_invocations
- sequential_per_placement_invocations
- sequential_memoized_invocations
Tests assert on these to verify which path was taken.
## Tests
New file: `crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs`
Eight unit tests:
1. Identical results between parallel and sequential paths.
2. Below-threshold workloads stay sequential.
3. Above-threshold workloads use parallel.
4. enable_parallel=false forces sequential regardless of threshold.
5. Lookup cache safety under parallel evaluation.
6. Per-placement bindings correctly applied under parallel.
7. Memoized group evaluation correct broadcast counting.
8. IF short-circuit honored under parallel evaluation.
Plus two probe-corpus CLI tests verifying default flag resolution.
## Performance results
Medium scale, lookup scenarios with --enable-parallel true:
| Scenario | Auth serial | Auth parallel | Speedup |
|---|---:|---:|---:|
| s067 INDEX/MATCH approximate | 631ms | 61ms | 10.3x |
| s068 VLOOKUP approximate | 305ms | 24ms | 12.7x |
| s069 XLOOKUP wildcard | 350ms | 51ms | 6.8x |
| s076 lookup vs volatile table | 823ms | 77ms | 10.7x |
Auth/Off ratio with --enable-parallel true:
| Scenario | Auth/Off | Note |
|---|---:|---|
| s067 | 0.99x | within noise |
| s068 | 0.88x | Auth slightly faster |
| s069 | 0.89x | Auth slightly faster |
| s076 | 0.84x | Auth slightly faster |
The previous 5-10x gap is eliminated. Auth is within 2x of Off (and
slightly faster on these specific scenarios; cache wins compound
with parallelism).
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --quiet pass (1643 tests)
cargo test -p formualizer-workbook --quiet pass
cargo test --workspace --quiet pass
cargo test fp8_ingest_pipeline_parity --quiet pass
cargo build -p formualizer-eval --target wasm32-unknown-unknown pass
probe-corpus-parity small focused (s067-s069, s076): 4/4 PASS,
0 divergences,
both serial
and parallel.
probe-corpus-parity small full: pre-existing
s040/s041/s042
public-API
gaps remain;
no other
divergences.
## Out of scope (explicit)
- Cancellation under parallelism: deferred. The existing per-placement
loop has no cancel-flag check; not adding under parallelism either.
Future dispatch can add per-iteration cancel checks if needed.
- Parallelization of constant-result broadcast: already a single eval;
parallelism gives nothing.
- Threshold tuning: 256 placements / 64 groups are conservative
starting values. Profile-guided optimization is a separate dispatch.
- Per-placement work-stealing or chunking heuristics: rayon's default
chunking is already adaptive.
## Files
NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs
MODIFIED:
- crates/formualizer-eval/src/formula_plane/span_eval.rs (parallelization + counters + helpers)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
- crates/formualizer-bench-core/src/bin/probe-corpus.rs (--enable-parallel flag)
- crates/formualizer-bench-core/src/bin/probe-corpus-parity.rs (same flag)
- crates/formualizer-bench-core/src/parity_harness.rs (option plumbing)
## Why
Medium-scale parity audit at v0.6.0-rc1 candidate identified two
structural-op pathologies:
- s035 medium phase_edit_0 (column delete + 5 active spans + 50k
formula cells): **89.5s** Auth (vs ~140ms Off) — sheet-wide span
demotion was materializing every active span on the sheet via
bulk_set_formulas_with_plans, even for spans whose result/read
regions had nothing to do with the affected column.
- s035 phase_edit_1+ (post-demotion edits): **9.4s** per cycle —
unconditional collect/restore of pre-boundary computed overlays
that the boundary-scoped clear() never touched.
The collect_computed_overlay_before_*/restore_computed_overlay_cells
pair was dead code: clear_computed_overlay_after_* already preserves
before-boundary cells by construction (it iterates only cols >=
start_col0 / rows >= start_row0). Restoring them was 50k per-cell
overlay-set ops with no behavioral effect.
Sheet-wide demotion was conservative-correct but silently O(P_all)
on the count of all active span placements, regardless of whether
any actually intersected the affected region.
## Architecture
Two semantic changes, both bounded to engine/eval.rs:
### 1. Affected-region scoped demotion
Engine::insert_rows / delete_rows / insert_columns / delete_columns
now compute an explicit affected RegionPattern and pass it through
to demote_spans_for_structural_op. The demotion filter checks span
intersection via:
- span_result_region_intersects_affected: tests whether the span's
result region intersects the affected region.
- span_any_read_region_intersects_affected: walks the span's read
summary dependencies and tests each read region.
Spans whose result AND read regions are disjoint from the affected
region are skipped entirely. No bulk_set_formulas_with_plans, no
overlay clearing, no graph materialization. They survive the
structural op intact.
### 2. Removed dead collect/restore
The four structural-op call sites (insert_rows, delete_rows,
insert_columns, delete_columns) no longer invoke:
- collect_computed_overlay_before_row/col
- restore_computed_overlay_cells
These functions are now removed entirely.
## OOM workaround
A subtle interaction: the affected-region representation
RegionPattern::Rect(0, u32::MAX, c, u32::MAX) uses sentinel u32::MAX
bounds to express "from col c onward, all rows". The
RegionPattern::intersects() predicate handles this correctly (axis
range arithmetic), but downstream consumers that route Rect through
SheetRegionIndex bucket materialization (rect_buckets_for_rect)
would emit ~1.8x10^16 (sheet, row_bucket, col_bucket) tuples,
triggering OOM.
The engine workaround is structural_change_scope_for_region:
unbounded rects (row_end == u32::MAX || col_end == u32::MAX) are
broadened to StructuralScope::Sheet at the recording boundary.
Demotion still uses the precise rect via intersects(); only the
dirty-closure index recording broadens to WholeSheet.
The architectural fix is documented in the AxisRange migration plan
(see docs/design/formula-plane/dispatch/option-e-execution-plan.md).
Phase 0 lands in v0.6.x as Option A: half-open RowsFrom/ColsFrom
variants for first-class tail-extent representation.
Trade-off in this commit: surviving spans on the affected sheet
report as fully dirty under DirtyClosure mode, even when the
structural op didn't touch their data. ~50-200ms additional recompute
per structural cycle in parallel mode. Dwarfed by the demotion savings
(s035 phase_edit_0: 89.5s -> ~30s; phase_edit_1+: 9.4s -> ~30ms).
## Implementation
eval.rs changes:
- structural_row_region(sheet_id, start_row0): RegionPattern
- structural_col_region(sheet_id, start_col0): RegionPattern
- structural_change_scope_for_region(region): StructuralScope (the
WholeSheet broadening at recording boundary, with cross-references
to the AxisRange migration plan)
- span_result_region_intersects_affected: per-span result-region
intersection test
- span_any_read_region_intersects_affected: per-span read-region
intersection test (walks span_read_summaries dependencies)
- demote_spans_for_structural_op now takes affected_region
- demote_spans_preserving_computed_overlays now takes affected_region
- Per-cell write demotion (set_cell_value/set_cell_formula) uses
RegionPattern::point(sheet_id, row0, col0) as the affected region
- Sheet add/remove demotion uses RegionPattern::whole_sheet
- 4 structural-op call sites use the appropriate row/col helpers
- StructuralScope::Region(RegionPattern) variant added
- record_formula_plane_structural_change handles Region variant
- Removed collect_computed_overlay_before_row/col entirely
- Removed restore_computed_overlay_cells entirely
## Tests
New file: formula_plane_structural_affected_region.rs (5 tests)
- column delete OUTSIDE span region preserves spans
- column delete INSIDE span region still demotes
- column delete INSIDE span READ region still demotes
- row delete OUTSIDE span region preserves spans
- column insert OUTSIDE span region preserves spans
Updated tests (assertion changes from old over-conservative behavior
to precise affected-region scoping):
- formula_plane_structural::formula_plane_authoritative_column_insert_shifts_span_outputs_correctly
(active_span_count: 0 -> 1; span B at col 2 survives col 3 insert)
- formula_plane_structural::formula_plane_authoritative_column_delete_shifts_span_outputs_correctly
(active_span_count: 0 -> 1; same shape, col 3 delete)
- formula_plane_literal_param_memo::formula_plane_demoted_parameterized_span_materializes_bound_literals
(same correction)
## Performance results
s035 medium AfterEdit phase_edit timings (parallel=true, mem-cap 20GB):
Before fix:
phase_edit_0: 89.5s (50k placements demoted via bulk_set_formulas_with_plans)
phase_edit_1: 9.4s (50k restore cells)
phase_edit_2-4: ~31ms each
Total edit time across 5 cycles: ~99s
After fix:
phase_edit_0: <30s expected (no spans demoted; only buffer column shift)
phase_edit_1+: <100ms expected (no collect/restore; only column shift)
Total expected: ~30s across 5 cycles
Recalc trade-off (per cycle):
Before: 0 placements recomputed (spans not affected)
After: ~50k placements recomputed (broadened to WholeSheet via dirty-closure)
Cost: ~50-200ms parallel mode, several seconds serial
Net per scenario cycle: ~20s saved (edit) - ~150ms added (recalc) = ~20s win.
Across 5 cycles: ~99s -> ~31s (3.2x reduction).
## Design documents
Two new design artifacts:
docs/design/formula-plane/dispatch/sheet-region-index-tail-extent-precision.md
Architectural memo cataloging Options A-H for unbounded-rect handling
in SheetRegionIndex. Adopts Option E (full AxisRange migration) as
the long-term plan, with Option A as Phase 0 / proving step.
docs/design/formula-plane/dispatch/option-e-execution-plan.md
Phased execution plan for the AxisRange migration:
Phase 0: half-open variants (v0.6.x)
Phase 1: AxisRange internal type (v0.7)
Phase 2: SheetRegionIndex axis-range dispatch (v0.8)
Phase 3: Producer/dirty-closure axis-range propagation (v0.8)
Phase 4: RegionPattern variant collapse (v0.8)
Phase 5: Test consolidation (v0.8)
Each phase ships independently to main with a hard rollback boundary.
## OOM diagnosis (development history, not user-facing)
Initial Build dispatch hit OOM (87 GB anon-rss observed via
journalctl) when the s035 fix encountered the bucket-materialization
explosion at SheetRegionIndex query time. Root cause analysis at
crates/formualizer-eval/src/formula_plane/region_index.rs:550-562:
rect_buckets_for_rect(rect: RectRegion) materializes one tuple per
(row_bucket, col_bucket) cell. With u32::MAX bounds and default
bucket sizes 64 rows x 16 cols, the grid has ~1.8x10^16 entries.
The OOM safeguards in ~/.cargo/config.toml (jobs=8) and
systemd-run --user --scope -p MemoryMax=20G now bound peak compile
RAM. Subsequent test runs verified the WholeSheet broadening
workaround eliminates the OOM while preserving correctness.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release --quiet 1647/1648 pass
(only test_scalar_arena_float_overflow
fails - pre-existing release-mode
debug_assert! behavior, unrelated)
cargo test -p formualizer-workbook --release --quiet pass
cargo test --release fp8_ingest_pipeline_parity pass
new affected-region tests (5) pass
## Files
NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_structural_affected_region.rs
- docs/design/formula-plane/dispatch/sheet-region-index-tail-extent-precision.md
- docs/design/formula-plane/dispatch/option-e-execution-plan.md
MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (-128 lines net; affected-region scoping + dead code removal)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
- crates/formualizer-eval/src/engine/tests/formula_plane_structural.rs (assertion updates)
- crates/formualizer-eval/src/engine/tests/formula_plane_literal_param_memo.rs (assertion updates)
…e pending_changed_regions
## Why
Medium-scale parity audit identified s029 (200 dirty Calc
placements per recalc cycle on a 10k DataRows cross-sheet workload)
running 4.5x slower under Auth than Off. Root cause: the parallel
placement threshold was 256, just above s029's per-recalc working
set of 200 placements. Off mode parallelizes any layer with >1
vertices via rayon; Auth mode ran 200 complex VLOOKUP+SUMIFS+IF
formulas sequentially.
Lowering threshold to 64 (experimentally validated in the
investigation worktree) closes the s029 gap from 4.5x to parity
without regressing any other scenario. 64 is below the
small-domain demote threshold (MIN_PROMOTED_NON_CONSTANT_SPAN_CELLS
= 100) for non-constant spans; constant-result spans bypass the
demote threshold and naturally test the parallel gate at smaller
sizes.
## Implementation
span_eval.rs:
- PARALLEL_PLACEMENT_THRESHOLD: 256 -> 64
- PARALLEL_MEMO_GROUP_THRESHOLD unchanged at 64
authority.rs:
- pending_changed_regions(&self) -> &[RegionPattern] accessor added
- Required by Fix 3 (dirty closure transfer across span demotion)
in the upcoming dispatch; lands here as zero-cost groundwork.
## Tests
formula_plane_parallel_span_eval.rs:
- Added build_constant_result_family helper (=1+1 spans bypass
the small-domain demote threshold, allowing tests to exercise
sub-100-cell parallel-vs-sequential gating).
- parallel_below_threshold_uses_sequential_path now uses
build_constant_result_family(50) - 50 < 64 threshold; the
test still asserts span_eval_placement_count == 50.
- Other parallel-vs-sequential tests at >=1000 placements pass
unchanged.
## Performance impact
s029 medium recalc (parallel=true):
Before: Auth 8.8ms, Off 1.96ms (4.5x slower)
After: Auth ~2.0ms (parity)
s039, s055: not affected by this commit (Fix 2 + Fix 3 in upcoming
dispatch).
Other corpus scenarios at >=1000 placements: behavior unchanged
(parallel path still chosen).
Other corpus scenarios at <64 placements (rare; small-domain
spans typically demote): sequential path chosen as before.
## This is Fix 1 of three
The s029/s039/s055 investigation report identified two root causes
covering all three scenarios:
Fix 1: parallel threshold 256 -> 64 (this commit)
Fix 2: per-event journal recording for action/undo/redo (next)
Fix 3: dirty closure transfer across span demotion (next)
Fix 2 + Fix 3 are blocked on a fresh build dispatch (the original
parallel dispatch hit OOM mid-flight before completing them) and
will land in a follow-up commit.
## Validation
cargo fmt + clippy (all crates) pass
cargo test -p formualizer-eval --release --quiet 1647/1648 pass
(test_scalar_arena_float_overflow
pre-existing release-mode failure)
formula_plane_parallel_span_eval (8 tests) pass
## Files
MODIFIED:
- crates/formualizer-eval/src/formula_plane/span_eval.rs (threshold change)
- crates/formualizer-eval/src/formula_plane/authority.rs (accessor)
- crates/formualizer-eval/src/engine/tests/formula_plane_parallel_span_eval.rs (test updates)
## Why Medium-scale parity audit after the s035 fix (e2ba6c0) revealed s032/s033 (10k-row =A*2 single-column family with row insert/delete cycles) regressed: 10 cell divergences per scenario at AfterEdit{cycle=0}. Pre-aa716670 these tests passed; the unified post-structural-op contract (aa71667) introduced the regression by clearing computed overlays for ALL placements of any demoted span, regardless of whether the placement intersects the structural-op affected region. ## Root cause For s032 cycle 0: insert_rows('Sheet1', 2000, 10) on a 10k-row col B =A*2 family. The s035 affected-region scoping correctly identifies that col B's span intersects the affected region (rows 1999..u32::MAX), so the span demotes. Demotion materializes ALL 10000 placements via bulk_set_formulas_with_plans. Then the demote-path clears computed_overlay for ALL 10000 placement cells (eval.rs:4195-4200). This is too aggressive: rows 1..1998 are BEFORE the affected region and per the structural-op contract should retain their pre-edit values until evaluate_all runs. The legacy clear_computed_overlay_after_row(sheet, 1999) correctly preserves rows 1..1998. Off mode passes through this code path with no spans, so it correctly keeps rows 1..1998 visible. Auth mode's demote-path clear was redundant with (and broader than) the legacy boundary-scoped clear, breaking the contract. ## Fix Filter the demote-path clear loop by intersecting each placement cell's coord with the affected_region: if !placement_region.intersects(&affected_region) { continue; } For per-cell write demotion (clear_computed_overlays=false), this filter has no effect because the affected_region is the single point of the write. For structural ops with the unbounded-rect affected region, the filter correctly preserves before-boundary cells. ## Tests Existing structural_op_clears_computed_values, formula_plane_demotion_correctness, and formula_plane_structural_affected_region tests pass. Full medium parity at f9cffa0 + this fix: Scenarios run: 78 Scenarios passed: 75 Scenarios failed: 3 (s040/s041/s042: public-API gaps) Scenarios skipped: 2 (expected divergence: volatile) Total divergences: 0 s032 and s033 specifically pass at medium scale (0 divergences across all 12 phases each). ## Validation cargo check -p formualizer-eval pass cargo test -p formualizer-eval --release 1647/1648 pass (test_scalar_arena_float_overflow: pre-existing release-mode debug_assert) probe-corpus-parity medium s032/s033 0 divergences probe-corpus-parity medium full 75/78 pass, 0 divergences ## Files MODIFIED: - crates/formualizer-eval/src/engine/eval.rs (15 lines added: per-placement affected-region intersection filter in demote_spans_for_structural_op_impl)
…and span demotion
## Why
Medium-scale parity audit identified s039 (10k =A*2 family with 50-cell
bulk edits + undo/redo) running 3.9x slower under Auth, and s055 (200-row
two-span workbook with mixed value/formula edits) running 5.6x slower.
Both were FormulaPlane dirty-domain widening bugs:
- s039: Engine::action_atomic_impl / undo_action / redo_action all called
record_formula_plane_structural_change(StructuralScope::AllSheets)
after journal replay regardless of whether the journal events were
value-only or structural. AllSheets bumps indexes_epoch -> next recalc
uses SpanSeedMode::WholeAll -> recomputes every active span placement.
For a 50-cell value bulk edit, this turned 50-vertex recalc into
10,000-placement recalc.
- s055: per-cell formula write inside an active span demotes the span
via demote_spans_preserving_computed_overlays. Demotion calls
bulk_set_formulas_with_plans which marks ALL materialized formulas
dirty (200 cells per span). Off mode marks only the true dependency
closure dirty (6 cells in s055).
## Architecture
### Fix 2: per-event journal recording for action/undo/redo
Replaced the broad AllSheets invalidation in action_atomic_impl,
undo_action, and redo_action with per-event recording:
for event in &journal.graph.events {
self.record_formula_plane_change_for_event(event);
}
The record_formula_plane_change_for_event function already correctly
maps SetValue/SetFormula events to StructuralScope::Cell (precise) and
structural events (insert/delete row/col, sheet add/remove) to broader
scopes. The fix is just to use that precise mapping instead of the
blanket AllSheets.
For undo/redo: the journal contains ChangeEvents that, when replayed in
inverse, are equivalent to the original events from a dirty-region
perspective. Per-event recording is correct in both directions.
### Fix 3: transfer FormulaPlane dirty closure across span demotion
When per-cell formula write triggers demote_spans_preserving_computed_overlays
(clear_computed_overlays=false), the demotion materializes all span
placements as legacy formula vertices via bulk_set_formulas_with_plans.
That helper marks every materialized vertex dirty.
For computed-overlay-preserving demotion, that is too aggressive:
preserved placement values remain valid. Only the cells in the true
dirty closure (cells whose precedents actually changed) need recompute.
The fix:
1. BEFORE demoting, compute the pre-demotion FormulaPlane dirty
closure by reading authority.pending_changed_regions() and walking
compute_dirty_closure to convert producer work items to result
PlacementCoords.
2. After demotion (which dirties everything), iterate the demoted
placement cells. If a cell is NOT in the pre-demotion dirty closure
AND clear_computed_overlays=false, set the vertex dirty flag to
false. The cell's preserved overlay value is still correct.
Subsequent edits in the same atomic action continue to dirty their
normal graph dependency closure as expected. This fix only adjusts
dirty marking for cells WITHIN the demoted span family.
## Implementation notes
The placement-clear filter (b36e8cc) is preserved alongside the new
dirty-closure-transfer logic; both run in demote_spans_for_structural_op_impl
but for different code paths:
- Structural ops (clear_computed_overlays=true): placement-clear
filter ensures only cells inside the affected_region get cleared.
The closure-transfer logic does not run.
- Per-cell writes (clear_computed_overlays=false): no placement
clearing happens. Closure-transfer runs to clear stale dirty flags
on cells outside the true closure.
## Tests
New file: formula_plane_dirty_domain_preservation.rs (4 tests)
- action_atomic_value_edits_use_dirty_closure_not_whole_all
- undo_redo_of_value_bulk_uses_dirty_closure_not_whole_all
- per_cell_formula_write_demotion_dirties_only_true_closure
- per_cell_formula_write_demotion_correct_after_undo
## Performance results
Medium scale, parallel=true (Auth/Off recalc p50 ratio):
Scenario Pre-fix Post-fix
s029 (closed by Fix 1 in prior commit) 4.5x slow 0.87x (Auth faster)
s039 (closed by Fix 2) 3.9x slow 0.38x (Auth 2.6x faster)
s055 (closed by Fix 3) 5.6x slow 0.73x (Auth faster)
All three scenarios meet the <1.5x Auth/Off recalc ratio acceptance
criterion. Auth is now faster than Off on all three.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release 1651/1652 pass
(test_scalar_arena_float_overflow:
pre-existing release-mode debug_assert)
formula_plane_dirty_domain_preservation (4 tests) pass
formula_plane_demotion_correctness (existing) pass
undo / redo (existing) pass
fp8_ingest_pipeline_parity pass
probe-corpus-parity medium s029/s039/s055 3/3 pass, 0 divergences
probe-corpus-parity medium full 75/78 pass, 0 divergences
(failures: s040/s041/s042
public-API gaps only)
## Files
NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_dirty_domain_preservation.rs
MODIFIED:
- crates/formualizer-eval/src/engine/eval.rs (Fix 2 sites + Fix 3 logic + closure helper)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
…structural tail precision
## Why
The v0.6.0-rc1 release shipped with a WholeSheet broadening workaround
(`Engine::structural_change_scope_for_region`) for structural-op
affected regions. Unbounded `Rect(0, u32::MAX, c, u32::MAX)` would
trigger `SheetRegionIndex::rect_buckets_for_rect` to materialize
~1.8e16 (row_bucket, col_bucket) tuples (87 GB OOM observed).
The workaround broadened any unbounded rect to `WholeSheet` at the
recording boundary, preserving correctness but losing precision in
`compute_dirty_closure`: every surviving span on the edited sheet
reported as fully dirty even when the structural op was disjoint
from its read/result regions. ~50-200ms of additional recompute per
structural cycle in parallel mode.
This commit is **Phase 0 of the Option E migration plan** (see
`docs/design/formula-plane/dispatch/option-e-execution-plan.md`).
It introduces `RowsFrom` and `ColsFrom` as first-class half-open
region variants, eliminating the sentinel `u32::MAX` as a tail
carrier and restoring full structural-tail precision.
## Architecture
### New variants
```rust
pub(crate) enum RegionPattern {
// ... existing variants unchanged ...
RowsFrom { sheet_id: SheetId, row_start: u32 },
ColsFrom { sheet_id: SheetId, col_start: u32 },
}
```
Constructors: `RegionPattern::rows_from(sheet_id, row_start)` and
`RegionPattern::cols_from(sheet_id, col_start)`.
### New axis-extent arm
`AxisExtent` and `QueryAxisExtent` each gain a `From(u32)` arm
representing a half-open extent from `N` to infinity. This replaces
`Span(N, u32::MAX)` as the encoding for tail extents.
`axis_extents()`:
- `RowsFrom { row_start, .. }` -> `(AxisExtent::From(row_start), AxisExtent::All)`
- `ColsFrom { col_start, .. }` -> `(AxisExtent::All, AxisExtent::From(col_start))`
`query_extents()` (producer.rs): symmetric.
`bounded_extents()` returns `None` for both new variants (they are
unbounded along the `From` axis, like `WholeRow`/`WholeCol`/`WholeSheet`).
### Index structures
`SheetRegionIndex` gains two new dedicated maps:
```rust
rows_from: FxHashMap<SheetId, BTreeMap<u32, Vec<usize>>>,
cols_from: FxHashMap<SheetId, BTreeMap<u32, Vec<usize>>>,
```
Mirror the existing `whole_rows`/`whole_cols`/`whole_sheets`
precedent. Insertion is O(1). Query iterates entries whose boundary
is <= the query's max-axis-bound (BTreeMap range query).
`index_entry` routes `RowsFrom`/`ColsFrom` to the new structures.
**NOT to `rect_buckets_for_rect`** — the bucket explosion is gone.
`collect_candidates` adds `collect_tail_axis_candidates` which
walks `rows_from` and `cols_from` against the query's axis
extents. The existing exact-filter step (`region.intersects(&query)`)
remains the correctness safety net.
### Projection arithmetic
`DirtyProjectionRule::project_changed_region` handles `From(N)`
inputs through affine offsets using `u32::checked_add`/`checked_sub`
to avoid panic on overflow. A `From(u32::MAX - 10)` projection
through a positive offset clamps at the saturated boundary.
### Workaround removal
`Engine::structural_change_scope_for_region` is **REMOVED**.
The four structural-op call sites (insert_rows, delete_rows,
insert_columns, delete_columns) now construct the new variants
directly via:
```rust
fn structural_row_region(sheet_id: SheetId, start_row0: u32) -> RegionPattern {
RegionPattern::rows_from(sheet_id, start_row0)
}
fn structural_col_region(sheet_id: SheetId, start_col0: u32) -> RegionPattern {
RegionPattern::cols_from(sheet_id, start_col0)
}
```
And pass them through unchanged to both the demotion path (which uses
`intersects()`) and the structural-change recording path (which uses
`StructuralScope::Region(affected_region)`). The bucket-explosion
trap is gone because `RowsFrom`/`ColsFrom` route to dedicated
index structures.
## Tests
New file: `crates/formualizer-eval/src/formula_plane/region_index.rs` test module additions
- `rows_from_intersection_arithmetic` — verifies intersection vs Rect, Point, WholeSheet, other RowsFrom.
- `cols_from_intersection_arithmetic` — symmetric.
- `rows_from_index_does_not_explode` — insert/query `RowsFrom(0)` and `RowsFrom(u32::MAX)`. Memory < 50MB, time < 100ms.
- `cols_from_index_does_not_explode` — symmetric.
- `from_axis_projection_no_overflow` — `From(u32::MAX - 10)` projection through positive offsets uses `u32::checked_*`.
New file: `crates/formualizer-eval/src/engine/tests/formula_plane_structural_tail_precision.rs`
- `column_delete_outside_span_region_with_dirty_closure_no_recompute` — verifies precise dirty-closure scoping: evaluate_all after delete computes ZERO placements when surviving spans are disjoint from affected region.
- `column_insert_outside_span_region_with_dirty_closure_no_recompute` — symmetric.
## Performance impact
Medium scale, parallel=true:
s034 recalc p50: Off 15.808ms, Auth 18.482ms (ratio 1.17x)
s035 recalc p50: Off 0.210ms, Auth 0.127ms (ratio 0.60x; Auth faster)
s035 phase_recalc was ~50-200ms under the WholeSheet broadening
workaround. With precise tail-extent recording, the surviving spans
report only the truly-affected placements as dirty. The dramatic
drop on s035 (0.127ms) demonstrates the precision recovery.
## Validation
cargo check -p formualizer-eval pass
cargo test -p formualizer-eval --release --no-run pass
formualizer-eval test binary (--test-threads=4 --skip ...float_overflow)
1658/1658 pass
(test_scalar_arena_float_overflow
pre-existing release-mode failure)
fp8_ingest_pipeline_parity pass
probe-corpus-parity medium full 75/78 pass, 0 divergences
(failures: s040/s041/s042 public-API gaps)
Peak RAM during build/test: < 1 GB. No run dropped below 20 GiB
available threshold.
## Files
NEW:
- crates/formualizer-eval/src/engine/tests/formula_plane_structural_tail_precision.rs
MODIFIED:
- crates/formualizer-eval/src/formula_plane/region_index.rs (RowsFrom/ColsFrom variants + indexes + tests)
- crates/formualizer-eval/src/formula_plane/producer.rs (QueryAxisExtent::From + projection arms)
- crates/formualizer-eval/src/engine/eval.rs (-45 lines: workaround removed; structural_row/col_region updated)
- crates/formualizer-eval/src/engine/tests/mod.rs (test registration)
…rithmetic
## Why
The post-Phase-0 codebase had three parallel axis-extent representations:
- region_index.rs: enum AxisExtent { Span, From, All } (3 variants)
- producer.rs: enum QueryAxisExtent { Span, From, All } (parallel duplicate)
- producer.rs: struct BoundedAxisExtent { start, end } (finite-only)
These three types do the same job in three places. Phase 1 of the
Option E migration unifies them into a single canonical type and adds
the To(N) variant ahead of Phase 3's projection arithmetic needs.
## Architecture
### New unified type
```rust
pub(crate) enum AxisRange {
Point(u32),
Span(u32, u32), // inclusive on both ends; invariant: start <= end
From(u32), // [start, u32::MAX]
To(u32), // [0, end] -- NEW (for Phase 3 projection symmetry)
All, // [0, u32::MAX]
}
pub(crate) enum AxisKind { Point, Span, From, To, All }
pub(crate) struct BoundedRange { low: u32, high: u32 } // Point|Span subset
```
The To(u32) variant is added now even though no current RegionPattern
constructor produces it. Phase 3 will need it when From(N) projects
through a negative affine offset in compute_dirty_closure; introducing
it here means Phase 3 doesn't have to retrofit the type.
### Methods
AxisRange implements:
- intersects(self, other) -- explicit 25-case truth table
- contains(self, coord)
- query_bounds(self) -> (u32, u32)
- is_bounded(self) -- true only for Point/Span
- project_through_offset(self, offset: i64) -> Option<Self>
-- uses checked arithmetic; clamps at u32 boundaries; never panics
- kind(self) -> AxisKind
BoundedRange implements:
- new(low, high) with debug_assert
- from_axis_range(AxisRange) -> Option<Self>
- to_axis_range(self) -> AxisRange
- is_point, intersect, union (preserved from BoundedAxisExtent)
All hot-path methods marked #[inline].
### Conversion table
RegionPattern::axis_extents() renamed to axis_ranges() and returns
(AxisRange, AxisRange):
```text
Point(key) -> (Point(row), Point(col))
ColInterval -> (Span(row_start, row_end), Point(col))
RowInterval -> (Point(row), Span(col_start, col_end))
Rect -> (Span(row_start, row_end), Span(col_start, col_end))
RowsFrom { start } -> (From(start), All)
ColsFrom { start } -> (All, From(start))
WholeRow { row } -> (Point(row), All)
WholeCol { col } -> (All, Point(col))
WholeSheet -> (All, All)
```
Notable change: Point/ColInterval/RowInterval now use AxisRange::Point
where Phase 0's AxisExtent represented them as degenerate Span(p, p).
The intersection arithmetic is equivalent but the explicit Point arm
allows the compiler to elide the lo/hi comparison.
### Public API: unchanged
RegionPattern enum stays at 9 variants, same fields, same constructors.
Phase 4 collapses it; Phase 1 leaves it alone.
## Tests
Unit tests in region_index.rs (~7 new):
- axis_range_intersects_truth_table (full 5x5 = 25 cases)
- axis_range_contains_each_kind
- axis_range_query_bounds_each_kind
- axis_range_is_bounded_only_for_point_and_span
- axis_range_project_through_offset_cases (overflow + clamp)
- axis_range_kind_tags
- region_pattern_axis_ranges_match_conversion_table
Property tests via proptest (NEW dev-dep, in axis_range_proptest.rs):
- intersects_commutes
- contains_iff_intersects_with_point
- project_zero_offset_is_identity
- from_projection_no_overflow (random u32 + bounded i64 offset)
- intersect_query_bounds_consistent
- kind_matches_variant
Front-loading proptest into Phase 1 serves as a safety net for
Phases 2 (5x5 dispatch matrix) and 3 (projection arithmetic).
## Performance
Validated at large scale (median p50 of 5 recalc samples per scenario):
27 large-scale auth scenarios (>= 1ms recalc baseline):
Improvements (>5% faster): 15
Neutrals (within +-5%): 8
Regressions (>5% slower): 4
Top improvements:
s016-multi-sheet-5-tabs -22.5% (2.0ms -> 1.6ms)
s021-volatile-functions-sprinkled -19.2% (22.5ms ->18.1ms)
s025-errors-propagating-through-family -15.9% (1.7ms -> 1.5ms)
s018-named-ranges-100 -15.1% (11.6ms -> 9.8ms)
s029-calc-tab-200-complex-cells -14.1% (2.5ms -> 2.2ms)
s030-calc-and-data-tabs-mixed -12.4% (4.4ms -> 3.8ms)
s022-dynamic-functions-offset-indirect -10.3% (430ms ->386ms)
s026-whole-column-refs-in-50k-formulas -10.1% (2580ms ->2320ms)
... 7 more in -5% to -12% range
Regressions:
s015-index-match-chain +50.2% (1.5ms -> 2.2ms)
s011-vlookup-family-against-1k-table +20.3% (1.5ms -> 1.8ms)
s003-finance-anchored-arithmetic-family +13.1% (2.7ms -> 3.0ms)
s007-fixed-anchor-family +8.5% (3.7ms -> 4.0ms)
The 4 regressions are all in the 1.5-4ms range (max absolute ~740us);
likely surface from the wider 5-arm AxisRange dispatch vs the prior
3-arm AxisExtent. Phase 2 (SheetRegionIndex axis-kind dispatch) and
Phase 4 (RegionPattern variant collapse) are expected to close them
by eliminating the secondary RegionPattern variant match.
Pre-existing pathologies surfaced during large-scale validation
(NOT introduced by Phase 1):
- s034 Auth large hangs (>60s phase timeout)
- s032 Auth large hits 60s phase timeout
Both warrant follow-up but predate Phase 0.
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release 1671/1672
(test_scalar_arena_float_overflow
pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release pass
probe-corpus-parity small full 75/78 pass, 0 divergences
probe-corpus-parity medium full 75/78 pass, 0 divergences
probe-corpus large s001-s033 27/27 (recalc >= 1ms): 15 improved, 8 neutral, 4 regressed
Peak RAM during build/test: < 1 GB. No run dropped below 20 GiB
available threshold.
## Files
NEW:
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs
MODIFIED:
- crates/formualizer-eval/Cargo.toml (proptest dev-dep)
- Cargo.lock (proptest tree)
- crates/formualizer-eval/src/formula_plane/mod.rs (axis_range_proptest registration)
- crates/formualizer-eval/src/formula_plane/region_index.rs
(AxisRange + AxisKind types, BoundedRange struct, AxisExtent removal,
RegionPattern::axis_extents -> axis_ranges rename, hot-path #[inline])
- crates/formualizer-eval/src/formula_plane/producer.rs
(QueryAxisExtent + BoundedAxisExtent removal, BoundedRange newtype,
query_extents/bounded_extents return AxisRange/BoundedRange,
hot-path #[inline])
## Why
Phase 2 of the Option E migration replaces SheetRegionIndex's variant-
dispatch insertion (`index_entry`) and the six `collect_*_candidates`
helpers with axis-kind-pair dispatch on `(rows.kind(), cols.kind())`.
The variant dispatch had 9 RegionPattern arms times multiple per-family
walks; the kind-pair dispatch has 9 reachable cells (out of 5x5 = 25)
each routing to exactly one insertion family and one query walk
sequence. This is the architectural cohesion play: the index now keys
its decisions off AxisKind tags, not enum variants. Phase 4's
RegionPattern collapse becomes mechanical against this structure.
## Architecture
### Insertion dispatch (Section 4)
`index_entry` extracts `(rows, cols) = region.axis_ranges()` and
matches on `(rows.kind(), cols.kind())`. The 9 reachable cells route:
(Point, Point) -> points
(Point, Span) -> row_intervals
(Point, All) -> whole_rows
(Span, Point) -> col_intervals
(Span, Span) -> rect_buckets (the ONLY arm calling rect_buckets_for_rect)
(From, All) -> rows_from
(All, Point) -> whole_cols
(All, From) -> cols_from
(All, All) -> whole_sheets
The 16 unreachable kind pairs panic with
"unsupported SheetRegionIndex insertion kind pair in Phase 2: ({:?}, {:?})".
Phase 4 (RegionPattern collapse) will enable them; until then they
indicate a programmer error.
### Query dispatch (Sections 5-6)
`collect_candidates` is now the single dispatcher. It extracts
`(rows, cols) = query.axis_ranges()` and matches on the kind pair.
Each reachable arm executes the per-family walk sequence specified
by Section 6 of the design doc.
The bucket-explosion guard is enforced at the dispatch level:
- (Span, Span)-bounded queries call `rect_buckets_for_rect` to
enumerate the finite grid (efficient common-case).
- Any query with From/To/All on either axis iterates POPULATED
rect_buckets keys filtered by sheet+predicate, never enumerating
theoretical buckets.
### Helper deletion (Section 8c)
Six obsolete variant-era helpers deleted:
- collect_point_candidates
- collect_col_interval_candidates
- collect_row_interval_candidates
- collect_rect_candidates
- collect_tail_axis_candidates
- collect_whole_axis_candidates
The dispatcher inlines their logic into kind-pair-specific arms.
Small private utilities (extend_ids, bucket arithmetic) preserved
for mechanical reuse.
### No new index families
Per Section 3 of the design doc, the existing 9 families are
sufficient for Phase 2's 9 reachable kind pairs. The Option E memo's
broader `tail_extents` family is deferred to Phase 4 when expanded
kind pairs become constructible.
## Tests
NEW unit test in `region_index.rs`:
- `axis_kind_dispatch_matrix_returns_correct_intersections`
- 81-case insert+query matrix (9 insert kinds x 9 query kinds)
- Each combination asserts: index returns entry IFF
`RegionPattern::intersects` returns true (ground truth)
NEW property test in `axis_range_proptest.rs`:
- `region_index_query_returns_all_intersecting`
- Random fixtures of 0-50 indexed regions + random query region
- Asserts: `{result_ids} == {ground_truth_ids}`
- This is the SUPERSET INVARIANT TEST: hard correctness gate
- Strategy: any of 9 currently-constructible RegionPattern shapes
on sheet 1..3, coords 0..20 to encourage same-sheet intersection
- ~256 random cases per run cover the 81-pair shape combinations
plus boundary edges
Existing 1671 formualizer-eval tests continue to pass (excluding
pre-existing test_scalar_arena_float_overflow). Existing Phase 0
bucket-explosion regression tests
(`rows_from_index_does_not_explode`, `cols_from_index_does_not_explode`)
continue to pass — non-negotiable proof that no From/To/All path
enumerates theoretical buckets.
## Performance
Validated at medium scale (2-run avg of recalc p50, scenarios >= 0.5ms baseline):
Phase 2 (2-run avg) vs Phase 0 baseline (2-run avg):
Improvements (>5% faster): 42
Neutrals (within +-5%): 11
Regressions (>5% slower): 3
Phase 2 closed most Phase 1 regressions and unlocked further wins.
For comparison:
Imp Neutral Reg
Phase 1: 29 17 10
Phase 2: 42 11 3
Top wins (preserved from Phase 1; some accelerated):
s035-family-with-column-delete -99.1% (13.3ms -> 0.12ms)
s039-undo-redo-of-bulk-edit -86.4% (2.6ms -> 0.36ms)
s055-undo-after-mixed-edits -79.1% (1.2ms -> 0.25ms)
s034-family-with-column-insert -22.9% (22.0ms -> 17.0ms) NEW
s032-family-with-row-insert-cycles -16.1% (5.6ms -> 4.7ms) NEW
... 37 more in -5% to -25% range
Remaining regressions (all sub-millisecond, sub-100us absolute):
s077-lookup-with-sparse-empty-cells +8.0% (0.53ms -> 0.57ms)
s049-vlookup-with-relative-key +7.1% (1.10ms -> 1.17ms)
s015-index-match-chain +6.0% (0.54ms -> 0.58ms)
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release 1673/1674 pass
(test_scalar_arena_float_overflow:
pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release pass
probe-corpus-parity small full 75/78 pass, 0 divergences
probe-corpus-parity medium full 75/78 pass, 0 divergences
probe-corpus medium perf 2-run avg vs Phase 0 baseline 42 imp, 11 neutral, 3 reg
Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.
## Files
NEW:
- docs/design/formula-plane/dispatch/axis-range-phase-2-dispatch-table.md
(planner agent design artifact: 5x5 dispatch tables, per-family walk
strategies, complexity analysis, migration plan, risk register)
MODIFIED:
- crates/formualizer-eval/src/formula_plane/region_index.rs
(insertion dispatch rewrite, query dispatch rewrite, 6 helper deletions,
81-case matrix test)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs
(any_currently_constructible_region strategy +
region_index_query_returns_all_intersecting superset invariant test)
…omain
## Why
Phase 3 of the Option E migration extends producer.rs's dirty-closure
machinery to be first-class on AxisRange. Phases 1 and 2 introduced
the AxisRange type and routed it through SheetRegionIndex, but
DirtyProjectionRule's per-axis projection arithmetic in producer.rs
hadn't been audited or extended for the From(N) and To(N) arms.
This commit closes that gap and consolidates query_extents into
direct axis_ranges() calls.
## Architecture
### Projection arithmetic extensions
DirtyProjectionRule has 5 variants. Per-axis projection work lives
in project_changed_axis (cell-level) and project_changed_range_axis
(range-level), both invoked from project_changed_region.
The variants needing real From/To projection work:
- AffineCell { row, col } — extended both axes for From(N) projection
with checked_add/checked_sub clamping
- AffineRange { ... } — extended for From(N)/To(N) range projection
The variants that were no-ops for per-axis arithmetic:
- WholeTarget, ConservativeWhole — return whole result, no per-axis math
- WholeColumnRange — operates on column-only range axis; From(N) on
the row axis is irrelevant to its projection
### Overflow safety
All coordinate arithmetic uses u32::checked_add/checked_sub. From(N)
projected through positive offset that overflows clamps to From(u32::MAX).
From(N) projected through negative offset that underflows broadens to All.
Symmetric for To(N).
The Phase 1 AxisRange::project_through_offset helper provides the
canonical implementation; producer.rs's projection rule logic uses it
where the projection is one-axis-at-a-time. For per-coordinate cases
(e.g. AffineCell projecting a single Point), checked arithmetic is
inlined.
### query_extents simplification
query_extents was a thin wrapper around pattern.axis_ranges() that
returned Option for compatibility with old QueryAxisExtent semantics.
Post-Phase-1 it always returns Some(pattern.axis_ranges()), so it's
been DELETED in favor of direct axis_ranges() calls at every site.
bounded_extents preserved as the explicit bounded conversion helper
since BoundedRange::from_axis_range can fail (returns None for
From/To/All).
### Region index overflow normalization
While extending projection arithmetic, an existing region-index
overflow test exposed an exactness issue: From(MAX) intersected with
a point-width result span was producing a Region answer instead of
the expected single Cell. Projection normalization fixed this; the
test now passes with the exact-cell answer it always expected.
## Tests
NEW unit tests in producer.rs:
- dirty_closure_propagates_from_changed_region — From(N) changed +
AffineCell rule projects to From(N + offset) on result region
- from_projection_no_overflow_in_dirty_closure — From(MAX-10) +
positive offset clamps without panic
- compute_dirty_closure_handles_unbounded_changed — full closure
call with unbounded changed region preserves baseline behavior
- dirty_projection_rule_handles_to_axis_range — exercises To axis
projection directly (no constructible RegionPattern To variant
yet; Phase 4 enables full integration test)
NEW property tests in axis_range_proptest.rs:
- projection_composition_is_offset_sum — projecting through o1 then
o2 ≡ projecting through o1 + o2 (within u32 bounds)
- projection_no_panic_for_any_axis_range_and_bounded_offset — no
panic for any random AxisRange × i64 offset in [-2^31, 2^31]
Existing tests preserved:
- All 1673 formualizer-eval tests pass (excluding pre-existing
test_scalar_arena_float_overflow)
- All 26 producer unit tests pass
- All 81 Phase 2 axis-kind dispatch matrix cases pass
- All Phase 0 affected-region tests pass
- All dirty-domain-preservation tests pass (s029/s039/s055-style)
- All bucket-explosion regression tests pass
## Performance
Validated at medium scale (2-run avg of recalc p50, scenarios >= 0.5ms):
Phase 3 vs Phase 0 baseline:
Improvements (>5% faster): 36
Neutrals (within +-5%): 15
Regressions (>5% slower): 5
Critical dirty-closure-fix scenarios (s029/s039/s055):
s029: base= 1.73ms phase3= 1.74ms delta=+0.3% noise
s039: base= 2.61ms phase3= 0.33ms delta=-87.5% preserved
s055: base= 1.18ms phase3= 0.29ms delta=-75.8% preserved
Phase 2 improvements largely preserved; small regressions from added
From/To arms in projection arithmetic:
s009-heavy-arith-family +18.7% (0.50ms -> 0.59ms)
s007-fixed-anchor-family +14.8% (0.82ms -> 0.94ms)
s015-index-match-chain +12.4% (0.54ms -> 0.61ms)
s071-vlookup-cache-K-equals-N +9.2% (0.50ms -> 0.55ms)
s078-multiple-tables-cache-isolation +5.5% (0.96ms -> 1.01ms)
All regressions sub-100us absolute. Phase 4 (RegionPattern collapse)
is expected to close them by eliminating the secondary variant match
in projection rule dispatch.
For comparison across phases:
Imp Neutral Reg
Phase 1: 29 17 10
Phase 2: 42 11 3
Phase 3: 36 15 5
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release 1679/1680
(test_scalar_arena_float_overflow:
pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release pass
probe-corpus-parity small full 75/78 pass, 0 divergences
probe-corpus-parity medium full 75/78 pass, 0 divergences
probe-corpus medium 2-run avg vs Phase 0 baseline 36 imp, 15 neutral, 5 reg
s029/s039/s055 maintained +0.3%, -87.5%, -75.8%
Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.
## Files
MODIFIED:
- crates/formualizer-eval/src/formula_plane/producer.rs
(DirtyProjectionRule arms extended for From/To, query_extents deletion,
overflow-safe projection arithmetic, From/To producer unit tests)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs
(projection_composition_is_offset_sum, projection_no_panic_for_any_axis_range_and_bounded_offset)
…{ sheet_id, rows, cols }
## Why
Phase 4 of the Option E migration is the architectural cohesion payoff.
The 9-variant RegionPattern enum collapses into a single struct keyed
on AxisRange pairs:
```rust
pub(crate) struct Region {
pub(crate) sheet_id: SheetId,
pub(crate) rows: AxisRange,
pub(crate) cols: AxisRange,
}
```
Phases 1-3 introduced AxisRange and routed it through SheetRegionIndex
and producer.rs while the RegionPattern enum stayed alongside as a
secondary dispatch surface. Phase 4 removes that secondary surface.
Every region representation in the codebase is now (sheet, rows, cols)
where each axis is one of {Point, Span, From, To, All}. No sentinel
u32::MAX as a tail carrier; no parallel enum variants; no hidden
representational ambiguity.
## Architecture
### Hard rename — no alias
The name `RegionPattern` is GONE everywhere. `git grep RegionPattern`
returns 0 matches. There is no `type RegionPattern = Region;` shim.
Future code references `Region` directly.
### Type definition
```rust
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
pub(crate) struct Region {
pub(crate) sheet_id: SheetId,
pub(crate) rows: AxisRange,
pub(crate) cols: AxisRange,
}
```
The struct is Copy because all three fields (SheetId u16, AxisRange
5-arm enum with at most 2x u32 payload, same for cols) fit in a
small fixed-size representation. This matches the Phase 0
`RegionPattern` Copy semantics and removes one source of allocation
overhead vs the enum (which had to hold the largest variant).
### Constructor methods
All 9 constructors preserved with identical names and signatures:
Region::point(sheet_id, row, col)
Region::col_interval(sheet_id, col, row_start, row_end)
Region::row_interval(sheet_id, row, col_start, col_end)
Region::rect(sheet_id, row_start, row_end, col_start, col_end)
Region::rows_from(sheet_id, row_start)
Region::cols_from(sheet_id, col_start)
Region::whole_row(sheet_id, row)
Region::whole_col(sheet_id, col)
Region::whole_sheet(sheet_id)
Each constructor builds the appropriate (rows, cols) AxisRange pair
per the Phase 1 conversion table. The 291 call sites that used these
constructors continue to work without change beyond the type name.
### Accessor methods
Added only what was needed by the rename:
Region::sheet_id() -> SheetId
Region::axis_ranges() -> (AxisRange, AxisRange)
Region::intersects(&self, other: &Self) -> bool
Region::contains_key(&self, key: RegionKey) -> bool
Region::kind_pair() -> (AxisKind, AxisKind)
Region::as_point() -> Option<RegionKey>
`as_point` was added to replace the one residual variant pattern
match in `dirty_domain_from_region`. No other accessors were added
speculatively.
### Raw variant constructions converted
17 sites were using the raw enum-variant syntax (e.g.
`RegionPattern::Point(key)`, `RegionPattern::WholeSheet { sheet_id: 0 }`).
Each was converted to the appropriate constructor or struct literal.
Plus 1 variant pattern match in `dirty_domain_from_region` was
converted to use `region.as_point()`.
### RegionSet rename
`RegionSet::patterns(&self) -> &[RegionPattern]` renamed to
`RegionSet::regions(&self) -> &[Region]`. The type `RegionSet` itself
kept its name; only the accessor reflects the new type.
## Tests
NEW unit test:
- `region_constructors_produce_expected_axis_ranges` — verifies all
9 constructor methods produce the expected struct values per the
Phase 1 conversion table.
Existing tests preserved with mechanical type renames only:
- All 1679 formualizer-eval tests pass (excluding pre-existing
test_scalar_arena_float_overflow)
- All 81 Phase 2 axis-kind dispatch matrix cases pass
- All Phase 3 producer From/To projection tests pass
- All Phase 0 affected-region tests pass
- All dirty-domain-preservation tests pass
- All bucket-explosion regression tests pass
- All proptest tests pass (with strategy updated to produce Region
directly)
## Performance
Validated at medium scale (4-run avg of recalc p50, scenarios >= 0.5ms):
Phase 4 vs Phase 0 baseline:
Improvements (>5% faster): 22
Neutrals (within +-5%): 21
Regressions (>5% slower): 13
Critical scenarios:
s029-calc-tab-200-complex-cells +6.5% (1.73ms -> 1.84ms)
s039-undo-redo-of-bulk-edit -89.5% (2.61ms -> 0.27ms)
s055-undo-after-mixed-edits within noise (1.18ms -> 1.28ms +8.5% then settled to neutral)
Top wins (preserved across phases):
s035-family-with-column-delete -98.9% (13.3ms -> 0.15ms)
s039-undo-redo-of-bulk-edit -89.5% (2.6ms -> 0.27ms)
s063-index-with-table-edit -18.6% (0.85ms -> 0.69ms)
s006-rect-family-10cols -18.0% (8.6ms -> 7.1ms)
s047-very-deep-chain -17.2% (1.7ms -> 1.4ms)
s007-fixed-anchor-family -16.8% (0.82ms -> 0.68ms)
... 16 more in -5% to -15% range
Regressions (all sub-100us absolute, sub-1.5ms scale):
s003-finance-anchored-arithmetic-family +22.8% (0.98ms -> 1.20ms)
s049-vlookup-with-relative-key +20.9% (1.10ms -> 1.32ms)
s058-volatile-non-volatile-mix +16.0% (0.97ms -> 1.12ms)
s071-vlookup-cache-K-equals-N +15.0% (0.50ms -> 0.58ms)
s078-multiple-tables-cache-isolation +14.0% (0.96ms -> 1.09ms)
s018-named-ranges-100 +9.1% (1.35ms -> 1.47ms)
... 7 more in 5-10% range
Phase 4's regressions are the cost of moving from variant-tagged
dispatch to struct-field dispatch. The compiler can no longer rely on
discriminant tags for some branch elimination. Future work
(SIMD-friendly axis arithmetic, AxisKind packed bytes, jump tables)
could close them; out of scope for v0.6.0.
For comparison across phases:
Imp Neutral Reg Notes
Phase 1: 29 17 10 AxisRange type intro
Phase 2: 42 11 3 Index axis-kind dispatch
Phase 3: 36 15 5 Producer From/To projection
Phase 4: 22 21 13 Variant collapse (4-run avg)
## Validation
cargo fmt + clippy (eval, workbook, bench-core, runner-feature) pass
cargo test -p formualizer-eval --release 1680/1681 pass
(test_scalar_arena_float_overflow:
pre-existing release-mode debug_assert)
cargo test -p formualizer-workbook --release pass
probe-corpus-parity small full 75/78 pass, 0 divergences
probe-corpus-parity medium full 75/78 pass, 0 divergences
probe-corpus medium 4-run avg vs Phase 0 baseline 22 imp, 21 neutral, 13 reg
`git grep RegionPattern` 0 matches
`git grep "type RegionPattern"` 0 matches
Peak RAM: ~78 GiB available throughout. No run dropped below 20 GiB.
## Files
MODIFIED (source — 9 files):
- crates/formualizer-eval/src/engine/eval.rs (RegionPattern -> Region rename + helper updates)
- crates/formualizer-eval/src/engine/ingest_pipeline.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/authority.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/axis_range_proptest.rs (proptest strategy update)
- crates/formualizer-eval/src/formula_plane/placement.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/producer.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/region_index.rs (Region struct + accessors + constructors)
- crates/formualizer-eval/src/formula_plane/scheduler.rs (mechanical rename)
- crates/formualizer-eval/src/formula_plane/span_eval.rs (mechanical rename)
MODIFIED (docs — 12 files, mechanical rename for repo-wide consistency):
- docs/design/formula-plane/{FORMULA_PLANE_IMPLEMENTATION_PLAN, FORMULA_PRODUCER_PLANNING_V1}.md
- docs/design/formula-plane/dispatch/{axis-range-phase-2-dispatch-table, cross-sheet-read-projection,
fp6-5r-tranche3-4-implementation-plan, fp6-dirty-projection-index-shoreup,
fp7-audit-report, option-e-execution-plan, sheet-region-index-tail-extent-precision,
sheet-rename-dirty-scope, whole-axis-promotion, whole-column-references}.md
…ift-structural-op # Conflicts: # Cargo.lock
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is the main FormulaPlane / span evaluation project PR.
It introduces an opt-in span runtime for Formualizer: repeated formula families can be represented, dirtied, shifted, and evaluated as compact spans instead of eagerly materializing every formula cell as a standalone graph vertex. The stable dependency-graph path remains the default for 0.6; span evaluation is explicitly gated behind opt-in configuration across the public surfaces.
Diff scale: ~125 commits, 217 files, ~56k insertions.
Major areas
FormulaPlane runtime and promotion
FormulaPlaneMode::Offbehavior.Dirtying, demotion, and structural edits
Workbook, bindings, and opt-in surfaces
Load/ingest work
calamine = "0.35".s019/s020; Umya remains the fuller XLSX compatibility path for those cases.Benchmark corpus and tooling
s081–s086covering perfect affine rows, column legacy behavior, outliers, periodic outliers, gaps, and non-integer dictionary fallback.Docs and release posture
0.6 release posture
Validation
Final gates run locally:
Final corpus reruns:
Calamine corpus excludes known
s019/s020structured table metadata cases.