Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730
Merged
Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730
Conversation
Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add BenchmarkDotNet benchmark for large styles.xml scenarios
Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator
Mar 13, 2026
…styles.xml Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a BenchmarkDotNet benchmark class targeting large
xl/styles.xmlscenarios, plus a deterministic fixture generator. Provides a baseline for measuring the cost of styles parsing and the "copy-through styles.xml when not dirty" optimisation.New files
ZipUtils.csReads the uncompressed byte length of a named zip entry from an xlsx without extracting it.
GenerateHugeStylesWorkbook.csStep-wise fixture generator: starts at 10k styles, increments by 5k until
xl/styles.xmluncompressed ≥ 20 MB (max 63,500 — just below NPOI's 64k hard cap).Two constraints shaped the design:
numFmtper cell immediately crashes. Fixed by pre-building 20 fixed format IDs and cycling through them.cellXfsat ~64k entries; at a plain ~315 bytes/style that yields only ~20 MB max, making a 30 MB target impossible. The target is set to 20 MB, which the loop reliably hits at ~45k styles (~20.1 MB).To reach ~469 bytes/style (vs the naïve ~315), each entry is made verbose:
TypeOffset(<vertAlign>), family, scheme, andshadow/condense/extendviaCT_FontaccessorsquotePrefix+ protection (locked/hidden)Font creation uses
forceRegistration=trueinternally so generation is O(n), not O(n²).LargeStylesBenchmark.csThree benchmarks;
GlobalSetupcallsEnsureExistsonce:XSSFWorkbookLargeStylesOpenDisposeXSSFWorkbookLargeStylesForceLoadCreateCellStyle()+CreateDataFormat()— cost of forced parseXSSFWorkbookLargeStylesOpenWriteMemoryStream— baseline for copy-through saveNPOI.Benchmarks.csprojConditional
PreserveNewestcopy rule fordata\HugeStyles.xlsx(skipped when file doesn't exist, so clean builds don't fail).Original prompt
Implement a new BenchmarkDotNet benchmark for large styles.xml scenarios.
Repository: nissl-lab/npoi (base branch: master)
Goals:
Add a new benchmark class
LargeStylesBenchmarkunderbenchmarks/NPOI.Benchmarks/that measures:CreateCellStyle()andCreateDataFormat())Add a deterministic generator that creates an .xlsx file with a large
xl/styles.xmlby creating many unique styles and applying them to cells.xl/styles.xmluncompressed size reaches at least 30MB (so it should land in the 20–30MB+ range; small overshoot acceptable). Use step size 5,000 styles, max 150,000.benchmarks/NPOI.Benchmarks/data/HugeStyles.xlsx.LargeStylesBenchmark.GlobalSetup()only if the file is missing or too small.Add a small utility for reading the uncompressed size of a zip entry inside the .xlsx (ZipArchive) and specifically
xl/styles.xml.Constraints:
benchmarks/NPOI.Benchmarks/Program.csunless required; it already usesBenchmarkSwitcher.FromAssembly(...).Run().NPOI.Benchmarksnamespace.using var.Acceptance criteria:
dotnet run -c Release --project benchmarks/NPOI.Benchmarksdiscovers and can runLargeStylesBenchmark.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: 1
Assistant: Proceed with (1) strict laziness:
GetStylesSource()returns theStylesTableinstance without forcing parse; the first real style-table usage triggersStylesTable.EnsureLoaded().Concrete changes to implement
1)
StylesTable: defer parsing + thread-safeEnsureLoaded()Add fields:
private PackagePart _stylesPart;private bool _isLoaded;private readonly object _loadLock = new object();Change
internal StylesTable(PackagePart part)to store the part and not parse immediately.Add
internal void EnsureLoaded()with double-checked locking, and move the existing parse logic into a privateLoadFromPart()method that:_stylesPart.GetInputStream()ConvertStreamToXml(...)ReadFrom(xmldoc)_isLoaded = trueAdd
EnsureLoaded()at the top of all relevant public/internal methods/properties onStylesTablethat rely on parsed content (fonts/fills/borders/xfs/number formats/dxfs/table styles/colors, etc.). This is important because callers may obtainStylesTableand use it directly.2)
XSSFWorkbook: don’t force parse inGetStylesSource()GetStylesSource()(or equivalent accessor used by tests) does not call any load method.stylesSource.EnsureLoaded()(or go through a local helper) before using the style table:CreateCellStyle()GetCellStyleAt(...)CreateDataFormat()CreateFont,GetFontAt,FindFont, etc.)This meets your requirement: parse
styles.xmlonly on first style-related API use.3) Tests
Add a test proving strict laziness:
.xlsxwith styles (any existing test sample that has styles)wb.GetStylesSource()and then check (via internal visibility / reflection) that styles are not loaded yet.wb.CreateCellStyle()orwb.CreateDataFormat())Implementation detail: to avoid changing public API, we can expose
internal bool IsLoadedForTestsinStylesTableand use[assembly: InternalsVisibleTo("...TestCases...")]if that’s already how the test project works. Otherwise we can use reflection in the test.If you want me to actually implement this in the repo, tell me which workflow you prefer:
nissl-lab/npoi(you’d need to explicitly ask “create a PR”).User: If styles are not changed, don't serialize or commit the styles, use existing styles.xml as output stream instead
Assistant: That’s doable, but it’s a separate (and very worthwhile) optimization: skip rewriting
xl/styles.xmlwhen it was never loaded/modified, and just copy the original part bytes through to the output package.Key idea
For loaded workbooks:
styles.xmlwas never parsed and no style-related API was called that would mutate styles, then we can treat styles as unchanged.This pull request was created from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.