fix: accept binary bytes on the PDF header line in non strict mode#481
Merged
J-F-Liu merged 1 commit intoJ-F-Liu:mainfrom Mar 22, 2026
Merged
fix: accept binary bytes on the PDF header line in non strict mode#481J-F-Liu merged 1 commit intoJ-F-Liu:mainfrom
J-F-Liu merged 1 commit intoJ-F-Liu:mainfrom
Conversation
This was referenced Mar 19, 2026
1f104cd to
d56765e
Compare
In lenient mode (default), only capture version digits from the header line, skipping any trailing binary marker bytes that some generators (e.g. ImageMill) place before the newline. In strict mode, reject headers with trailing bytes after the version string.
d56765e to
a52c7f9
Compare
Contributor
Author
|
@J-F-Liu this should be ready for a review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #480
Only capture version-like characters (digits and '.') in the header, then skip any remaining bytes on the line. This matches the approach used by pdf.js (read until whitespace or 7 chars max) and qpdf (regex for digits only).
Also added more unit tests for parsing various PDF headers I saw in the dataset I was working on.