Better support for the non strict mode

I'm opening this issue to work on better support of non strict mode in this package. This follow up on https://github.com/J-F-Liu/lopdf/issues/41 which requested to add non strict mode and that was added in afe79a46921426f11cee0df7c063c8016232dc5e

 I work on a 300k PDF dataset (public tender documents from various sources) and here are the parsing errors that I see by occurrence,

 - Invalid PDF structure (756 documents): possibly related https://github.com/J-F-Liu/lopdf/issues/433 and https://github.com/J-F-Liu/lopdf/issues/41
 - invalid file header (296 documents) will be addressed by https://github.com/J-F-Liu/lopdf/pull/481
 - Invalid file trailer (215 documents) : possibly related https://github.com/J-F-Liu/lopdf/issues/318
 - Invalid content stream (125 documents): possibly related https://github.com/J-F-Liu/lopdf/issues/78
 - Invalid cross reference table  (3 documents): possibly related https://github.com/J-F-Liu/lopdf/issues/463

I will try to make a PR at least for cases where it's small fix in the non strict mode. @J-F-Liu 

FYI @abimaelmartell


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for the non strict mode #484

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Better support for the non strict mode #484

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions