Replace Hashtable/ArrayList with generic collections in formula eval hot path#1742
Conversation
…ation hot path Eliminates boxing overhead and improves type safety in the most frequently called code during formula evaluation. - PlainCellCache: Hashtable → Dictionary<Loc, PlainValueCellCacheEntry> - FormulaCellCache: Hashtable → Dictionary<object, FormulaCellCacheEntry> - Also fixes bug: Remove() was keying on cell instead of cell.IdentityKey - OperationEvaluatorFactory: Hashtable → Dictionary<OperationPtg, Function> - FormulaUsedBlankCellSet: Hashtable → Dictionary<BookSheetKey, BlankCellSheetGroup> - Ptg.ReadTokens: ArrayList → List<Ptg> - FormulaParser.Arguments/ParseArrayRow: ArrayList → List<ParseNode>/List<object> All lookups converted to TryGetValue to avoid double-lookup patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When both operands are very large doubles (> ~7.9e28), casting to decimal throws OverflowException. Fall back to double arithmetic, matching Excel behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ken-swyfft
left a comment
There was a problem hiding this comment.
Code Review
Overall: Approve with minor suggestions. The collection replacements are mechanically correct, no public API signatures change, and no thread-safety regression.
Good catch: FormulaCellCache.Remove() bug fix
The old code was calling _formulaEntriesByCell.Remove(cell) instead of Remove(cell.IdentityKey) — meaning entries were never actually removed, causing a memory leak during cell mutations. The fix is correct.
Moderate concerns
-
Missing test coverage for the
FormulaCellCache.Removefix — This is a real behavioral change (fixing a memory leak). A regression test demonstrating that entries are actually evicted afterNotifyDeleteCellwould guard against this silently regressing. -
Missing test coverage for the
MultiplyEvaloverflow fix — Casting adouble> ~7.9e28 todecimalthrowsOverflowException; the fallback todoublearithmetic is correct and matches Excel. However, the existing test file (TestMultiplyEval.cs) doesn't cover this scenario. A test case likeConfirm(new NumberEval(1e29), new NumberEval(1e29), 1e58)would lock this in.
Minor notes
- The 32% benchmark improvement is plausible — the double-lookup →
TryGetValuein the innermost eval loop is likely the primary driver, and theRemovebug may have caused unbounded cache growth slowing hash operations. Profiler attribution would strengthen the claim. FormulaCellCachekey type remainsobject(becauseIEvaluationCell.IdentityKeyreturnsObject). Fine for this PR's scope, just noting it.- Pre-existing
Equalsnull-safety issues inLocandBookSheetKey(direct cast without null/type check) — out of scope but worth a follow-up. - Removal of unused
System.Runtime.Serialization.Formatters.Binaryusing inPtg.csis correct cleanup.
…rflow - TestFormulaCellCacheRemoveActuallyEvicts: verifies entries are evicted after Remove(), guarding against the bug where Remove() keyed on cell instead of cell.IdentityKey - TestLargeValuesOverflowDecimal: verifies MultiplyEval falls back to double arithmetic when operands exceed decimal range (~7.9e28) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The CI is failing above, but I think those are failures that will be fixed by this PR here: #1746. All tests pass clean on my local. |
|
Thanks for the thorough review! Both moderate concerns were valid — addressed in 42364a3: 1. 2. Full suite green: 2,752 main + 1,787 OOXML = 4,539 passed across both net8.0 and net472. |
Summary
HashtablewithDictionary<K,V>andArrayListwithList<T>in the formula evaluation hot path, eliminating boxing overhead and double-lookup patternsFormulaCellCache.Remove()which was keying on the cell object instead ofcell.IdentityKey, causing silent removal failuresDecimaloverflow crash inMultiplyEvalwhen operands exceeddecimalrange (~7.9e28), falling back todoublearithmetic to match Excel behaviorargstoBenchmarkSwitcher.Run()so--filterworks from CLIFiles changed
PlainCellCache.cs—Hashtable→Dictionary<Loc, PlainValueCellCacheEntry>FormulaCellCache.cs—Hashtable→Dictionary<object, FormulaCellCacheEntry>OperationEvaluatorFactory.cs—Hashtable→Dictionary<OperationPtg, Function>FormulaUsedBlankCellSet.cs—Hashtable→Dictionary<BookSheetKey, BlankCellSheetGroup>Ptg.cs—ArrayList→List<Ptg>FormulaParser.cs—ArrayList→List<ParseNode>/List<object>MultiplyEval.cs— catchOverflowExceptionfor large valuesProgram.cs— passargsthrough to BenchmarkSwitcherBenchmark results (EvaluateAll on 1.43M formulas, 17MB .xlsx)
Test plan
LargeExcelFileBenchmark.Evaluatecompletes without crash🤖 Generated with Claude Code