Skip to content

Implement lazy loading for xl/styles.xml in XSSFWorkbook#1731

Merged
tonyqus merged 3 commits intomasterfrom
copilot/implement-lazy-loading-styles
Mar 13, 2026
Merged

Implement lazy loading for xl/styles.xml in XSSFWorkbook#1731
tonyqus merged 3 commits intomasterfrom
copilot/implement-lazy-loading-styles

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

Currently, opening any .xlsx file eagerly parses StylesTable and, on every save, clears and rewrites xl/styles.xml even when styles were never touched — causing byte-level diffs in round-trips with no style mutations.

Changes

StylesTable.cs — lazy parse + dirty tracking

  • StylesTable(PackagePart part) no longer calls ReadFrom() in the constructor; XML parsing is deferred via a new EnsureLoaded() method
  • Two flags drive the behavior:
    • _isLoaded — set on first access; triggers ReadFrom() + theme propagation
    • _isTouched — set by any mutation; controls whether Commit() fires
  • PrepareForCommit() and Commit() are no-ops when _isTouched = false, so the original ZipPackagePart bytes pass through unchanged to the output ZIP
  • Default (new-workbook) constructor initializes both flags true — no behavior change for new files
  • All read methods call EnsureLoaded(); all write methods call both EnsureLoaded() and MarkTouched()
  • Theme setter no longer iterates fonts/borders when the model is not yet loaded; EnsureLoaded() applies the theme post-parse
// Before: always rewrites styles.xml on save
// After: only rewrites if styles were mutated

using var wb = new XSSFWorkbook(stream); // styles NOT parsed yet
wb.GetSheetAt(0);                        // still not parsed/touched
wb.Write(outStream);                     // styles.xml preserved byte-for-byte

wb2 = new XSSFWorkbook(stream);
wb2.CreateFont();                        // EnsureLoaded() + MarkTouched()
wb2.Write(outStream);                    // styles.xml regenerated normally

TestXSSFWorkbook.cs — three new tests

  • LazyStyles_UntouchedStylesPreservedOnSave — asserts byte-identical xl/styles.xml after a no-style-touch round-trip
  • LazyStyles_TouchedStylesModifiedOnSave — asserts xl/styles.xml differs after CreateFont(), and the result reloads correctly
  • LazyStyles_BasicOperationsWorkWithoutForcingStylesLoad — smoke test for sheet/cell access and GetStylesSource() under lazy mode
Original prompt

Goal

Implement lazy loading for xl/styles.xml in XSSFWorkbook and ensure that if styles are never touched, saving the workbook does not re-serialize / commit a regenerated styles.xml into the package part, but instead preserves the original styles.xml bytes/stream from the source file.

Repo: nissl-lab/npoi

Background / Current Behavior

  • XSSFWorkbook.OnDocumentRead() currently discovers a StylesTable via RelationParts and assigns it to stylesSource eagerly. (See ooxml/XSSF/UserModel/XSSFWorkbook.cs around OnDocumentRead() where stylesSource is set when p is StylesTable.)
  • When saving, POIXMLDocumentPart.OnSave() calls PrepareForCommit() which clears the PackagePart, then Commit() which writes XML. This causes styles.xml to be rewritten if it is part of the document tree.

Required Behavior

  1. Lazy-load styles

    • Do not parse or instantiate StylesTable on workbook open.
    • Parse/instantiate styles only when needed (e.g., when GetStylesSource() is called or any API that requires styles is invoked).
  2. Preserve original styles.xml when styles are untouched

    • If the workbook was opened from an existing package containing xl/styles.xml and the user never touches styles:
      • Saving must not clear and rewrite that part.
      • The original styles.xml bytes must remain unchanged in the output.
  3. Normal behavior when styles are touched

    • If any style-related API is invoked that mutates styles (create/modify styles, fonts, fills, borders, number formats, etc.), the workbook should materialize StylesTable and commit it on save as before.

Implementation Details (Guidance)

A) XSSFWorkbook changes

  • Add fields to XSSFWorkbook to support lazy behavior, e.g.:

    • a reference to the original PackagePart for styles.xml (if present)
    • flags like stylesLoaded, stylesTouched
  • During OnDocumentRead():

    • When encountering the styles relation/part, do not construct a StylesTable instance.
    • Instead, store the PackagePart (and relationship id if needed) for later.
  • Update GetStylesSource():

    • If stylesSource not yet loaded and a styles PackagePart exists, parse it to a StylesTable on demand.
    • If missing, create a new StylesTable (but only when styles are first requested).
  • Ensure all internal places that need styles go through GetStylesSource().

B) Prevent styles part from being committed if untouched

  • Ensure that when styles are never touched:
    • the styles part is not part of the committed POIXMLDocumentPart relation tree or
    • commit logic for that part is a no-op (and does not clear the part) so the original bytes survive.

Possible approaches:

  • Keep stylesSource as null and do not include a live StylesTable part in relations when untouched.
  • Or introduce a lightweight document part wrapper for styles that preserves original bytes unless marked dirty.

C) Dirty tracking

  • Mark stylesTouched = true when APIs are used that create/modify styles:
    • CreateCellStyle, CreateFont, number format creation, custom styles, etc.
  • Be conservative: if unsure about a method’s impact, mark as touched.

D) Tests

Add/extend tests under testcases/ooxml to cover:

  1. Open an existing xlsx with a non-trivial styles.xml, do not touch styles, write out, and assert styles.xml bytes are identical.
  2. Open the same workbook, touch styles (e.g., create a style/font), write out, and assert styles.xml is modified (or at least not identical).
  3. Ensure basic workbook operations (reading cell values, sheets, etc.) still work without forcing styles load.

Use deterministic fixtures (add a small xlsx test file if needed) and read the xl/styles.xml entry from the written package for comparison.

Notes

  • Be mindful of read-only package behavior.
  • Ensure saving remains valid for both .xlsx and .xlsm workbooks.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Implement lazy loading for styles.xml in XSSFWorkbook and if styles are not touched, don't commit the styles to packagepart, use existing styles.xml stream instead

Assistant: To do this in XSSFWorkbook, you need two related changes:

  1. Lazy-load styles.xml (don’t parse it into a StylesTable unless a caller actually needs styles)
  2. On write, if styles were never touched, don’t serialize a new styles.xml into the package; instead preserve the original styles.xml bytes/stream from the source package.

Below is a concrete design that fits the current code structure shown in the repository.


What the current code does (relevant facts from repo)

  • XSSFWorkbook.OnDocumentRead() currently walks RelationParts and assigns stylesSource when it encounters a StylesTable part:
    else if (p is StylesTable stylesTable) stylesSource = stylesTable; citeturn8commentary0
  • `XSSFWorkbook.GetStylesSour...

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits March 13, 2026 04:43
- StylesTable(PackagePart part) constructor is now lazy: doesn't parse XML
  immediately. Parsing is deferred until first access via EnsureLoaded().
- Added _isLoaded and _isTouched flags to track state.
- Overrode PrepareForCommit() and Commit() to be no-ops when untouched,
  preserving original xl/styles.xml bytes in the output package.
- Added EnsureLoaded() calls to all data-accessing methods.
- Added EnsureLoaded() + MarkTouched() calls to all mutation methods.
- Modified Theme setter to only iterate fonts/borders if model is loaded;
  EnsureLoaded() applies the theme after parsing.
- Added 3 tests verifying: untouched preservation, touched modification,
  and basic workbook operations with lazy styles."

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
…tion order, improve comments

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement lazy loading for xl/styles.xml in XSSFWorkbook Implement lazy loading for xl/styles.xml in XSSFWorkbook Mar 13, 2026
Copilot AI requested a review from tonyqus March 13, 2026 04:46
@tonyqus
Copy link
Member

tonyqus commented Mar 13, 2026

BenchmarkDotNet v0.13.12, Windows 11 (10.0.26200.7922)
12th Gen Intel Core i5-12400, 1 CPU, 12 logical and 6 physical cores
.NET SDK 9.0.311
[Host] : .NET 8.0.24 (8.0.2426.7010), X64 RyuJIT AVX2
ShortRun : .NET 8.0.24 (8.0.2426.7010), X64 RyuJIT AVX2

Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3

Before This PR

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
XSSFWorkbookLargeStylesOpenDispose 1.200 s 0.1235 s 0.0068 s 40000.0000 37000.0000 5000.0000 327.68 MB
XSSFWorkbookLargeStylesForceLoad 1.196 s 0.5674 s 0.0311 s 40000.0000 37000.0000 5000.0000 328.37 MB
XSSFWorkbookLargeStylesOpenWrite 1.671 s 0.4115 s 0.0226 s 47000.0000 40000.0000 5000.0000 497.25 MB

After This PR

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
XSSFWorkbookLargeStylesOpenDispose 155.7 ms 81.65 ms 4.48 ms 8000.0000 6500.0000 2000.0000 62.66 MB
XSSFWorkbookLargeStylesForceLoad 1,092.9 ms 454.62 ms 24.92 ms 39000.0000 37000.0000 5000.0000 328.38 MB
XSSFWorkbookLargeStylesOpenWrite 625.2 ms 168.57 ms 9.24 ms 14000.0000 9000.0000 3000.0000 116.91 MB

@tonyqus tonyqus marked this pull request as ready for review March 13, 2026 06:58
@tonyqus tonyqus merged commit e84add7 into master Mar 13, 2026
3 of 4 checks passed
@tonyqus tonyqus added this to the NPOI 2.8.0 milestone Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants