Releases · PrinsFrank/pdfparser · GitHub

10 Aug 20:42

PrinsFrank

v2.3.1Minor parsing bugfixes, gz decode fallback

What's Changed

Add fallback gz decode mechanism when gz headers are missing by @PrinsFrank in #200
Properly retrieve content from uncompressed objects by @PrinsFrank in #201
Accept whitespace around array values by @PrinsFrank in #202
Support widths array spread over multiple lines by @PrinsFrank in #204
Fix exception message length on decode issue by @PrinsFrank in #205

Full Changelog: v2.3.0...v2.3.1

Contributors

PrinsFrank

Assets 2

09 Aug 14:23

PrinsFrank

v2.3.0 More extensive image support: JBIG2, TIFF, CMYK, several other formats and bugfixes

What's Changed

DecodeParams can be array, Pattern can be string by @PrinsFrank in #159
Add support for array filters in images by @PrinsFrank in #160
Accept space after byte offset last crossReference source by @PrinsFrank in #161
Accept space after start crossReference source by @PrinsFrank in #162
Accept reference value for MediaBox and CropBox by @PrinsFrank in #163
Scale and font size can both be negative by @PrinsFrank in #165
Increase logging for incorrect matrix transformation by @PrinsFrank in #166
Add support for padding and extra whitespace in reference array values by @PrinsFrank in #167
Add support for TIFF images (CCITTFaxDecode) by @PrinsFrank in #169
DecodeParams cannot be plain array by @PrinsFrank in #170
Accept subdictionary for decodeParams by @PrinsFrank in #171
Dictionary arrays can contain null by @PrinsFrank in #173
Properly support array filter types when retrieving image content by @PrinsFrank in #174
Properly handle newlines in CIDFonts widths by @PrinsFrank in #175
Add support for JBIG2 images by @PrinsFrank in #176
Add FillSignData as valid TypeNameValue by @PrinsFrank in #182
Accept scale values that are floats besides ints by @PrinsFrank in #184
Recover from invalid byte offset last cross reference section by looking for stream/table markers at the end of the document instead by @PrinsFrank in #186
Add support for PNG predictor algorithm None by @PrinsFrank in #187
Add support for CMYK rasterized images by @PrinsFrank in #189
Handle spaces between components of matrix transformation in graphics state operator by @PrinsFrank in #192
CONTENTS can be a ReferenceValueArray, not a simple ArrayValue by @PrinsFrank in #193
Don't parse characters in resource names as operators by @PrinsFrank in #195
Array values can be multiple resource names not seperated by space by @PrinsFrank in #196

Full Changelog: v2.2.1...v2.3.0

Contributors

PrinsFrank

Assets 2

0 Join discussion

21 Jul 18:29

PrinsFrank

v2.2.1 Several small parsing related bugfixes, add JPEG2000 support

What's Changed

Simplify working with underlying positioned text elements by moving retrieval of fonts to PositionedTextElement and presorting positionedTextElements by @PrinsFrank in #136
Properly handle newline between dictionary key and value by @PrinsFrank in #141
Use the length from a crossReferenceStream when it is available by @PrinsFrank in #143
Don't try to decode values with JPX_DECODE filter by @PrinsFrank in #145

Full Changelog: v2.2.0...v2.2.1

Contributors

PrinsFrank

Assets 2

20 Jul 14:09

PrinsFrank

v2.2.0 Add support for extraction of Rasterized Images

What's Changed

Delete stale workflow to prevent issues from being marked as stale by @PrinsFrank in #130
Implement rasterized image extraction by @PrinsFrank in #132
Add support for other colorspaces that are not DeviceColorSpaces but are stored as luts in objects by @PrinsFrank in #134

Full Changelog: v2.1.1...v2.2.0

Contributors

PrinsFrank

Assets 2

0 Join discussion

13 Jun 18:20

PrinsFrank

v2.1.1 Fixes bug when parsing PDFs where trailer marker is not followed by newline

What's Changed

Don't assume newline after trailer marker, by @PrinsFrank in #128

Full Changelog: v2.1.0...v2.1.1

Contributors

PrinsFrank

Assets 2

28 May 17:36

PrinsFrank

v2.1.0 Official PDF2.0 support, classes now non final to allow extension and mocking

What's Changed

Make classes non-final to open them for extension and mocking by @PrinsFrank in #124
Add official support for pdf 2.0 by @PrinsFrank in #125

Full Changelog: v2.0.0...v2.1.0

Contributors

PrinsFrank

Assets 2

0 Join discussion

19 May 18:22

PrinsFrank

v2.0.0 Multiline/positional text extraction, image extraction, many bugfixes

What's Changed

More strictly parse dictionary arrays by @PrinsFrank in #53
Add new PDF2.0 tabs nameValues by @PrinsFrank in #54
Fix issue where nested arrays in dictionaries were closed too early by @PrinsFrank in #56
Fix comment state in dictionary parsing by @PrinsFrank in #58
Fix nesting issues in text operator parsing by @PrinsFrank in #60
Correctly parse dictionary array values where items are seperated over several lines by @PrinsFrank in #62
Add missing public key security handlers to FilterNameValue by @PrinsFrank in #65
Add support for multiple codespaceranges by @PrinsFrank in #67
Remove extra decoding in uncompressed object by @PrinsFrank in #68
Allow dashes in font names by @PrinsFrank in #70
Fix issues where font operator contains font names by @PrinsFrank in #72
Rename TextObjects and collection to contentStream to better reflect expected content by @PrinsFrank in #73
Tc operator sets char space not char size by @PrinsFrank in #74
Keep track of content stream commands outside text objects by @PrinsFrank in #75
Remove dependency to internal phpunit method and ignore non-testing tests by @PrinsFrank in #76
Organize content stream classes by @PrinsFrank in #77
Organize text operators by @PrinsFrank in #78
Parse textObjects to intermediate PositionedTextElement by @PrinsFrank in #79
Implement remaining content stream operators by @PrinsFrank in #81
Fix false positives on content stream commands by @PrinsFrank in #82
Implement a transformation state stack to fix text lines appearing out of order by @PrinsFrank in #83
Fix a typo in CONTRIBUTING by @szepeviktor in #84
Fix content stream unit test after matrix multiplication fix resulting in correct x and y offsets by @PrinsFrank in #85
Add test for compressed byte offsets by @PrinsFrank in #87
Parse CIDFontWidths in dictionary values by @PrinsFrank in #89
Implement space insertion based on text width by @PrinsFrank in #90
Update samples dependency and increase space insertion threshold by @PrinsFrank in #91
Don't try to parse EMCs and other content stream content as dictionaries by @PrinsFrank in #92
Don't continue matching operators when in dictionary key or escaped string by @PrinsFrank in #94
Implement encryption detection by @PrinsFrank in #97
Use object length instead of searching for end marker when set in dictionary by @PrinsFrank in #95
Add missing tests for inMemoryStream by @PrinsFrank in #98
Implement missing features for literal string escape sequences by @PrinsFrank in #104
Implement toUnicodeCMap unicode mappings for string literals by @PrinsFrank in #106
Fix issue where string literals ended up as part of multibyte character groups by @PrinsFrank in #110
Fix issue when text state is set outside of text object by @PrinsFrank in #112
When in a string literal, don't keep track of other nesting levels as that is not possible, fixes several array delimiter issues in string literals by @PrinsFrank in #113
Add missing resource TypeNameValue by @PrinsFrank in #114
Fix: mb_convert_encoding can output false by @PrinsFrank in #115
Implement image extraction by @PrinsFrank in #118

New Contributors

@szepeviktor made their first contribution in #84

Full Changelog: v1.1.0...v2.0.0

Contributors

szepeviktor and PrinsFrank

Assets 2

0 Join discussion

16 May 19:35

PrinsFrank

v2.0.0 Alpha 5 Pre-release

Pre-release

What's Changed

Implement image extraction by @PrinsFrank in #118

Full Changelog: v2.0.0-alpha.4...v2.0.0-alpha.5

Contributors

PrinsFrank

Assets 2

0 Join discussion

15 May 19:25

PrinsFrank

v2.0.0 Alpha 4 Pre-release

Pre-release

What's Changed

Fix issue where string literals ended up as part of multibyte character groups by @PrinsFrank in #110
Fix issue when text state is set outside of text object by @PrinsFrank in #112
When in a string literal, don't keep track of other nesting levels as that is not possible, fixes several array delimiter issues in string literals by @PrinsFrank in #113
Add missing resource TypeNameValue by @PrinsFrank in #114
Fix: mb_convert_encoding can output false by @PrinsFrank in #115

Full Changelog: v2.0.0-alpha.3...v2.0.0-alpha.4

Contributors

PrinsFrank

Assets 2

0 Join discussion

07 May 18:59

PrinsFrank

v2.0.0 Alpha 3 Pre-release

Pre-release

What's Changed

Implement toUnicodeCMap unicode mappings for string literals by @PrinsFrank in #106

Full Changelog: v2.0.0-alpha.2...v2.0.0-alpha.3

Contributors

PrinsFrank

Assets 2

0 Join discussion