Accept binary bytes on the PDF header line

I have a pdf  [vittel_rdc.pdf](https://github.com/user-attachments/files/26108134/vittel_rdc.pdf) that fails to load in lopdf with
```
Error: Parse(invalid file header)
```
but otherwise opens fine in other pdf viewers.

After some investigations it looks like it has a header with something like
```
%PDF-1.3 \xb0\x9f\x92\x9c\x9f\xd4\xe0\xce\xd0\xd0\xd0\r
```
and was produced by ImageMill Imaging Library v3.000.

If I understand correctly, the pdf spec (ISO 32000, §7.5.2) would requires a newline before the binary comment line and that's what lopdf implements. However looking at what other pdf libraries implement they are a bit less strict,

- **pypdf**: Only validates the 5-byte `%PDF-` prefix, doesn't parse the rest of the line
- **pdf.js** (Mozilla): Reads version chars until whitespace (byte ≤ 0x20) or 7 chars max, ignores the rest
- **qpdf**: Uses regex `[0-9]+\.[0-9]+` to extract version digits, ignores trailing bytes

so I would propose to be a bit less strict about this header parsing, to not error on such files. There should be little downside.

Made a PR in https://github.com/J-F-Liu/lopdf/pull/481


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept binary bytes on the PDF header line #480

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Accept binary bytes on the PDF header line #480

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions