Skip to content

Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730

Merged
tonyqus merged 3 commits intomasterfrom
copilot/add-large-styles-benchmark
Mar 13, 2026
Merged

Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730
tonyqus merged 3 commits intomasterfrom
copilot/add-large-styles-benchmark

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

Adds a BenchmarkDotNet benchmark class targeting large xl/styles.xml scenarios, plus a deterministic fixture generator. Provides a baseline for measuring the cost of styles parsing and the "copy-through styles.xml when not dirty" optimisation.

New files

ZipUtils.cs

Reads the uncompressed byte length of a named zip entry from an xlsx without extracting it.

GenerateHugeStylesWorkbook.cs

Step-wise fixture generator: starts at 10k styles, increments by 5k until xl/styles.xml uncompressed ≥ 20 MB (max 63,500 — just below NPOI's 64k hard cap).

Two constraints shaped the design:

  • 250-format limit: the naïve approach of one unique numFmt per cell immediately crashes. Fixed by pre-building 20 fixed format IDs and cycling through them.
  • 64k-style cap: NPOI (and the xlsx spec) cap cellXfs at ~64k entries; at a plain ~315 bytes/style that yields only ~20 MB max, making a 30 MB target impossible. The target is set to 20 MB, which the loop reliably hits at ~45k styles (~20.1 MB).

To reach ~469 bytes/style (vs the naïve ~315), each entry is made verbose:

  • Font: underline, strikeout, charset, TypeOffset (<vertAlign>), family, scheme, and shadow/condense/extend via CT_Font accessors
  • xf: alignment sub-element with 7 attributes + quotePrefix + protection (locked/hidden)

Font creation uses forceRegistration=true internally so generation is O(n), not O(n²).

LargeStylesBenchmark.cs

Three benchmarks; GlobalSetup calls EnsureExists once:

Benchmark Measures
XSSFWorkbookLargeStylesOpenDispose Open + dispose, no style API — baseline for lazy load
XSSFWorkbookLargeStylesForceLoad CreateCellStyle() + CreateDataFormat() — cost of forced parse
XSSFWorkbookLargeStylesOpenWrite Full write to MemoryStream — baseline for copy-through save

NPOI.Benchmarks.csproj

Conditional PreserveNewest copy rule for data\HugeStyles.xlsx (skipped when file doesn't exist, so clean builds don't fail).

Original prompt

Implement a new BenchmarkDotNet benchmark for large styles.xml scenarios.

Repository: nissl-lab/npoi (base branch: master)

Goals:

  1. Add a new benchmark class LargeStylesBenchmark under benchmarks/NPOI.Benchmarks/ that measures:

    • Opening and disposing a workbook without touching style APIs.
    • Forcing styles usage (e.g., CreateCellStyle() and CreateDataFormat())
    • Writing the workbook to a stream.
  2. Add a deterministic generator that creates an .xlsx file with a large xl/styles.xml by creating many unique styles and applying them to cells.

    • Generate a workbook with initially 10,000 unique styles applied to 10,000 cells.
    • Increase style count until xl/styles.xml uncompressed size reaches at least 30MB (so it should land in the 20–30MB+ range; small overshoot acceptable). Use step size 5,000 styles, max 150,000.
    • Store the file at benchmarks/NPOI.Benchmarks/data/HugeStyles.xlsx.
    • Generator should run in LargeStylesBenchmark.GlobalSetup() only if the file is missing or too small.
  3. Add a small utility for reading the uncompressed size of a zip entry inside the .xlsx (ZipArchive) and specifically xl/styles.xml.

Constraints:

  • Do not modify benchmarks/NPOI.Benchmarks/Program.cs unless required; it already uses BenchmarkSwitcher.FromAssembly(...).Run().
  • Keep the new code in the NPOI.Benchmarks namespace.
  • Use idiomatic C# with using var.

Acceptance criteria:

  • dotnet run -c Release --project benchmarks/NPOI.Benchmarks discovers and can run LargeStylesBenchmark.
  • On first run, benchmark generates the workbook if not present; subsequent runs reuse it if size >= 30MB uncompressed styles.xml.
  • Code compiles.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: 1

Assistant: Proceed with (1) strict laziness: GetStylesSource() returns the StylesTable instance without forcing parse; the first real style-table usage triggers StylesTable.EnsureLoaded().

Concrete changes to implement

1) StylesTable: defer parsing + thread-safe EnsureLoaded()

  • Add fields:

    • private PackagePart _stylesPart;
    • private bool _isLoaded;
    • private readonly object _loadLock = new object();
  • Change internal StylesTable(PackagePart part) to store the part and not parse immediately.

  • Add internal void EnsureLoaded() with double-checked locking, and move the existing parse logic into a private LoadFromPart() method that:

    • reads _stylesPart.GetInputStream()
    • ConvertStreamToXml(...)
    • ReadFrom(xmldoc)
    • sets _isLoaded = true
  • Add EnsureLoaded() at the top of all relevant public/internal methods/properties on StylesTable that rely on parsed content (fonts/fills/borders/xfs/number formats/dxfs/table styles/colors, etc.). This is important because callers may obtain StylesTable and use it directly.

2) XSSFWorkbook: don’t force parse in GetStylesSource()

  • Ensure GetStylesSource() (or equivalent accessor used by tests) does not call any load method.
  • Style-related workbook APIs should still call stylesSource.EnsureLoaded() (or go through a local helper) before using the style table:
    • CreateCellStyle()
    • GetCellStyleAt(...)
    • CreateDataFormat()
    • font APIs (CreateFont, GetFontAt, FindFont, etc.)
    • any border/fill related workbook APIs

This meets your requirement: parse styles.xml only on first style-related API use.

3) Tests

Add a test proving strict laziness:

  • Open a real .xlsx with styles (any existing test sample that has styles)
  • Immediately after open, call wb.GetStylesSource() and then check (via internal visibility / reflection) that styles are not loaded yet.
  • Trigger a style API (e.g., wb.CreateCellStyle() or wb.CreateDataFormat())
  • Assert styles are now loaded.

Implementation detail: to avoid changing public API, we can expose internal bool IsLoadedForTests in StylesTable and use [assembly: InternalsVisibleTo("...TestCases...")] if that’s already how the test project works. Otherwise we can use reflection in the test.


If you want me to actually implement this in the repo, tell me which workflow you prefer:

  • A) you will code it and want me to provide patch guidance + list of methods to touch, or
  • B) you want a PR opened against nissl-lab/npoi (you’d need to explicitly ask “create a PR”).

User: If styles are not changed, don't serialize or commit the styles, use existing styles.xml as output stream instead

Assistant: That’s doable, but it’s a separate (and very worthwhile) optimization: skip rewriting xl/styles.xml when it was never loaded/modified, and just copy the original part bytes through to the output package.

Key idea

For loaded workbooks:

  • If styles.xml was never parsed and no style-related API was called that would mutate styles, then we can treat styles as unchanged.
  • On save, instead of calling `StylesT...

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
@tonyqus tonyqus added this to the NPOI 2.8.0 milestone Mar 13, 2026
Copilot AI changed the title [WIP] Add BenchmarkDotNet benchmark for large styles.xml scenarios Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator Mar 13, 2026
Copilot AI requested a review from tonyqus March 13, 2026 02:35
…styles.xml

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
@tonyqus tonyqus marked this pull request as ready for review March 13, 2026 03:49
@tonyqus tonyqus merged commit 7619fb6 into master Mar 13, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants