Skip to content

Fix for long document load times introduced in v0.35.0#486

Merged
J-F-Liu merged 5 commits intoJ-F-Liu:mainfrom
mcantrell:fix/doc-load-performance
Apr 2, 2026
Merged

Fix for long document load times introduced in v0.35.0#486
J-F-Liu merged 5 commits intoJ-F-Liu:mainfrom
mcantrell:fix/doc-load-performance

Conversation

@mcantrell
Copy link
Copy Markdown
Contributor

Summary

Removes nom_locate dependency to fix a performance regression introduced in v0.35.0. See #412 for details.

The parser input type was changed from &[u8] to LocatedSpan<&[u8], &str> in 0.35.0 to carry debug labels through the parser. LocatedSpan tracks line/column position by scanning for newlines (via memchr::count_raw) on every slice operation. For a large PDF, this means millions of redundant newline scans during parsing.

Note

Neither the line/column tracking nor the debug labels were ever read.** The only LocatedSpan methods actually used (.len(), .take_from(), .fragment()) all have direct equivalents on &[u8].

Fix

  • Removed nom_locate from Cargo.toml
  • Changed ParserInput<'a> from LocatedSpan<&[u8], &str> back to &[u8]
  • Replaced all ParserInput::new_extra(bytes, "label") construction sites with plain byte slices
  • Fixed minor type adjustments in trim_spaces signature, a verify closure, and cmap_parser test assertions

Result

Loading a 100-page PDF dropped from 9-10 seconds to ~10ms

Test:
tests/document_load_performance.rs with tests/regression/test.pdf

Warning

I've included the test.pd for testing but you may not actually want it. I just wanted to point it out because it's quite large. I can remove from the PR if you'd like.

@J-F-Liu J-F-Liu merged commit 7a05512 into J-F-Liu:main Apr 2, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants