Skip to content

fix(parser): accept trailing space in xref and startxref lines#485

Merged
J-F-Liu merged 1 commit intoJ-F-Liu:mainfrom
abimaelmartell:firecrawl/fix-xref-trailing-space
Mar 29, 2026
Merged

fix(parser): accept trailing space in xref and startxref lines#485
J-F-Liu merged 1 commit intoJ-F-Liu:mainfrom
abimaelmartell:firecrawl/fix-xref-trailing-space

Conversation

@abimaelmartell
Copy link
Copy Markdown
Contributor

Summary

  • Accept optional trailing space before EOL when parsing xref and startxref keywords
  • Some PDF generators (e.g. San Bernardino County public health reports) emit lines like xref \n and startxref \n with a trailing space before the newline
  • The xref subsection header already handled this via opt(tag(" ")) — this applies the same treatment to the two keyword lines

Encountered this in production, added some tests,

Thanks!

…ines

Some PDF generators emit trailing spaces on structural lines like
"xref \n" and "startxref \n". The xref subsection header already
handled this with opt(tag(" ")), but the xref keyword and startxref
keyword parsers did not, causing Xref(Start) errors on valid PDFs.
@J-F-Liu J-F-Liu merged commit b4a8c25 into J-F-Liu:main Mar 29, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants