Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator by Copilot · Pull Request #1730 · nissl-lab/npoi

Copilot · 2026-03-13T02:30:10Z

Adds a BenchmarkDotNet benchmark class targeting large xl/styles.xml scenarios, plus a deterministic fixture generator. Provides a baseline for measuring the cost of styles parsing and the "copy-through styles.xml when not dirty" optimisation.

New files

`ZipUtils.cs`

Reads the uncompressed byte length of a named zip entry from an xlsx without extracting it.

`GenerateHugeStylesWorkbook.cs`

Step-wise fixture generator: starts at 10k styles, increments by 5k until xl/styles.xml uncompressed ≥ 20 MB (max 63,500 — just below NPOI's 64k hard cap).

Two constraints shaped the design:

250-format limit: the naïve approach of one unique numFmt per cell immediately crashes. Fixed by pre-building 20 fixed format IDs and cycling through them.
64k-style cap: NPOI (and the xlsx spec) cap cellXfs at ~64k entries; at a plain ~315 bytes/style that yields only ~20 MB max, making a 30 MB target impossible. The target is set to 20 MB, which the loop reliably hits at ~45k styles (~20.1 MB).

To reach ~469 bytes/style (vs the naïve ~315), each entry is made verbose:

Font: underline, strikeout, charset, TypeOffset (<vertAlign>), family, scheme, and shadow/condense/extend via CT_Font accessors
xf: alignment sub-element with 7 attributes + quotePrefix + protection (locked/hidden)

Font creation uses forceRegistration=true internally so generation is O(n), not O(n²).

`LargeStylesBenchmark.cs`

Three benchmarks; GlobalSetup calls EnsureExists once:

Benchmark	Measures
`XSSFWorkbookLargeStylesOpenDispose`	Open + dispose, no style API — baseline for lazy load
`XSSFWorkbookLargeStylesForceLoad`	`CreateCellStyle()` + `CreateDataFormat()` — cost of forced parse
`XSSFWorkbookLargeStylesOpenWrite`	Full write to `MemoryStream` — baseline for copy-through save

`NPOI.Benchmarks.csproj`

Conditional PreserveNewest copy rule for data\HugeStyles.xlsx (skipped when file doesn't exist, so clean builds don't fail).

Original prompt

Implement a new BenchmarkDotNet benchmark for large styles.xml scenarios.

Repository: nissl-lab/npoi (base branch: master)

Goals:

Add a new benchmark class LargeStylesBenchmark under benchmarks/NPOI.Benchmarks/ that measures:
- Opening and disposing a workbook without touching style APIs.
- Forcing styles usage (e.g., CreateCellStyle() and CreateDataFormat())
- Writing the workbook to a stream.
Add a deterministic generator that creates an .xlsx file with a large xl/styles.xml by creating many unique styles and applying them to cells.
- Generate a workbook with initially 10,000 unique styles applied to 10,000 cells.
- Increase style count until xl/styles.xml uncompressed size reaches at least 30MB (so it should land in the 20–30MB+ range; small overshoot acceptable). Use step size 5,000 styles, max 150,000.
- Store the file at benchmarks/NPOI.Benchmarks/data/HugeStyles.xlsx.
- Generator should run in LargeStylesBenchmark.GlobalSetup() only if the file is missing or too small.
Add a small utility for reading the uncompressed size of a zip entry inside the .xlsx (ZipArchive) and specifically xl/styles.xml.

Constraints:

Do not modify benchmarks/NPOI.Benchmarks/Program.cs unless required; it already uses BenchmarkSwitcher.FromAssembly(...).Run().
Keep the new code in the NPOI.Benchmarks namespace.
Use idiomatic C# with using var.

Acceptance criteria:

dotnet run -c Release --project benchmarks/NPOI.Benchmarks discovers and can run LargeStylesBenchmark.
On first run, benchmark generates the workbook if not present; subsequent runs reuse it if size >= 30MB uncompressed styles.xml.
Code compiles.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: 1

Assistant: Proceed with (1) strict laziness: GetStylesSource() returns the StylesTable instance without forcing parse; the first real style-table usage triggers StylesTable.EnsureLoaded().

Concrete changes to implement

1) `StylesTable`: defer parsing + thread-safe `EnsureLoaded()`

Add fields:
- private PackagePart _stylesPart;
- private bool _isLoaded;
- private readonly object _loadLock = new object();
Change internal StylesTable(PackagePart part) to store the part and not parse immediately.
Add internal void EnsureLoaded() with double-checked locking, and move the existing parse logic into a private LoadFromPart() method that:
- reads _stylesPart.GetInputStream()
- ConvertStreamToXml(...)
- ReadFrom(xmldoc)
- sets _isLoaded = true
Add EnsureLoaded() at the top of all relevant public/internal methods/properties on StylesTable that rely on parsed content (fonts/fills/borders/xfs/number formats/dxfs/table styles/colors, etc.). This is important because callers may obtain StylesTable and use it directly.

2) `XSSFWorkbook`: don’t force parse in `GetStylesSource()`

Ensure GetStylesSource() (or equivalent accessor used by tests) does not call any load method.
Style-related workbook APIs should still call stylesSource.EnsureLoaded() (or go through a local helper) before using the style table:
- CreateCellStyle()
- GetCellStyleAt(...)
- CreateDataFormat()
- font APIs (CreateFont, GetFontAt, FindFont, etc.)
- any border/fill related workbook APIs

This meets your requirement: parse styles.xml only on first style-related API use.

3) Tests

Add a test proving strict laziness:

Open a real .xlsx with styles (any existing test sample that has styles)
Immediately after open, call wb.GetStylesSource() and then check (via internal visibility / reflection) that styles are not loaded yet.
Trigger a style API (e.g., wb.CreateCellStyle() or wb.CreateDataFormat())
Assert styles are now loaded.

Implementation detail: to avoid changing public API, we can expose internal bool IsLoadedForTests in StylesTable and use [assembly: InternalsVisibleTo("...TestCases...")] if that’s already how the test project works. Otherwise we can use reflection in the test.

If you want me to actually implement this in the repo, tell me which workflow you prefer:

A) you will code it and want me to provide patch guidance + list of methods to touch, or
B) you want a PR opened against nissl-lab/npoi (you’d need to explicitly ask “create a PR”).

User: If styles are not changed, don't serialize or commit the styles, use existing styles.xml as output stream instead

Assistant: That’s doable, but it’s a separate (and very worthwhile) optimization: skip rewriting xl/styles.xml when it was never loaded/modified, and just copy the original part bytes through to the output package.

Key idea

For loaded workbooks:

If styles.xml was never parsed and no style-related API was called that would mutate styles, then we can treat styles as unchanged.
On save, instead of calling `StylesT...

This pull request was created from Copilot chat.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>

…styles.xml Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>

Initial plan

b3da337

Copilot AI assigned Copilot and tonyqus Mar 13, 2026

Copilot started work on behalf of tonyqus March 13, 2026 02:30 View session

Add LargeStylesBenchmark, GenerateHugeStylesWorkbook, and ZipUtils

b11f72f

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>

tonyqus added performance xlsx labels Mar 13, 2026

tonyqus added this to the NPOI 2.8.0 milestone Mar 13, 2026

Copilot AI changed the title ~~[WIP] Add BenchmarkDotNet benchmark for large styles.xml scenarios~~ Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator Mar 13, 2026

Copilot AI requested a review from tonyqus March 13, 2026 02:35

Copilot finished work on behalf of tonyqus March 13, 2026 02:35

Copilot started work on behalf of tonyqus March 13, 2026 02:37 View session

Fix generator: respect numFmt limit, add verbose attrs, target 20 MB …

6fd6cb7

…styles.xml Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>

Copilot finished work on behalf of tonyqus March 13, 2026 03:19

tonyqus marked this pull request as ready for review March 13, 2026 03:49

tonyqus merged commit 7619fb6 into master Mar 13, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730

Add LargeStylesBenchmark with deterministic HugeStyles.xlsx generator#1730
tonyqus merged 3 commits intomasterfrom
copilot/add-large-styles-benchmark

Copilot AI commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New files

ZipUtils.cs

GenerateHugeStylesWorkbook.cs

LargeStylesBenchmark.cs

NPOI.Benchmarks.csproj

Concrete changes to implement

1) StylesTable: defer parsing + thread-safe EnsureLoaded()

2) XSSFWorkbook: don’t force parse in GetStylesSource()

3) Tests

Key idea

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 13, 2026 •

edited

Loading

`ZipUtils.cs`

`GenerateHugeStylesWorkbook.cs`

`LargeStylesBenchmark.cs`

`NPOI.Benchmarks.csproj`

1) `StylesTable`: defer parsing + thread-safe `EnsureLoaded()`

2) `XSSFWorkbook`: don’t force parse in `GetStylesSource()`