Cache XSSFRow.FirstCellNum/LastCellNum to avoid O(n) LINQ scans#1745
Cache XSSFRow.FirstCellNum/LastCellNum to avoid O(n) LINQ scans#1745ken-swyfft wants to merge 1 commit intonissl-lab:masterfrom
Conversation
GetFirstKey() and GetLastKey() used LINQ Min()/Max() which enumerate all dictionary keys on every call. FirstCellNum and LastCellNum are called in tight loops (e.g., cell range iteration during copy/shift operations), making this O(n) per access a significant bottleneck. Cache the min/max column indices and update them O(1) on add, with lazy invalidation on remove (only re-scans when a boundary cell is removed). Also adds safety-net tests for sparse rows, large column indices, interleaved add/remove, and ordering persistence. Benchmark results (10,000 repeated FirstCellNum+LastCellNum accesses): - Sparse row (3 cells): 799 us -> 11 us (71x faster), 2813 KB -> 0 B - Dense row (200 cells): 25,724 us -> 11 us (2,276x faster), 4375 KB -> 0 B Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ken-swyfft
left a comment
There was a problem hiding this comment.
Code Review
Overall: Approve. Clean, well-targeted performance optimization. Caching logic is correct, all mutation points are covered, no API/ABI changes.
Correctness — verified complete
All four mutation sites for _cells are properly handled:
| Mutation site | Cache action |
|---|---|
Constructor (_cells.Add) |
UpdateCacheOnAdd called |
CreateCell (_cells[idx] = ...) |
UpdateCacheOnAdd called |
RemoveCell (_cells.Remove) |
InvalidateCacheOnRemove called |
RebuildCells (_cells.Clear + _cells.Add) |
Full invalidation |
The sentinel value of -1 is safe — column indices are always non-negative (0–16383), and _cells.Count == 0 is checked before accessing the cache. The short cast is safe because the maximum value is 16384 (max_column + 1), well within short range.
Performance claims — justified
SortedDictionary.Keys.Min() and .Max() are both O(n) via LINQ (LINQ has no way to know the collection is sorted). Cached access is O(1) with zero allocation. The claimed speedups are credible.
Tests
The interleaved add/remove test is particularly well-designed, covering the full lifecycle: empty → add → add → add → remove middle → remove first → add new first → remove last → remove all.
Minor observations (non-blocking)
-
GetFirstKey()could useFirst()instead ofMin()— Since_cellsis aSortedDictionary, the enumerator yields keys in ascending order, so_cells.Keys.First()is O(1) vsMin()being O(n). With caching this re-scan is rare, so the practical impact is minimal. -
RebuildCellscould populate the cache eagerly — Since it already iterates all cells, it could track min/max during the rebuild loop instead of invalidating. ButRebuildCellsis called infrequently, so the lazy approach is fine.
Summary
XSSFRow.GetFirstKey()andGetLastKey()used LINQMin()/Max()which enumerate all dictionary keys on every call — O(n) per accessFirstCellNumandLastCellNumare called in tight loops throughout the codebase (cell range iteration during copy, shift, auto-size, formula evaluation)CreateCell(), with lazy invalidation onRemoveCell()(only re-scans when a boundary cell is removed).Cellsproperty ordering, and save/reload ordering persistenceBenchmark results (10,000 repeated
FirstCellNum+LastCellNumaccesses)All 4,542 existing tests pass (OOXML: 1,792, Main: 2,750).
Test plan
🤖 Generated with Claude Code