Releases: PrinsFrank/pdfparser
Releases · PrinsFrank/pdfparser
v2.3.1Minor parsing bugfixes, gz decode fallback
What's Changed
- Add fallback gz decode mechanism when gz headers are missing by @PrinsFrank in #200
- Properly retrieve content from uncompressed objects by @PrinsFrank in #201
- Accept whitespace around array values by @PrinsFrank in #202
- Support widths array spread over multiple lines by @PrinsFrank in #204
- Fix exception message length on decode issue by @PrinsFrank in #205
Full Changelog: v2.3.0...v2.3.1
v2.3.0 More extensive image support: JBIG2, TIFF, CMYK, several other formats and bugfixes
What's Changed
- DecodeParams can be array, Pattern can be string by @PrinsFrank in #159
- Add support for array filters in images by @PrinsFrank in #160
- Accept space after byte offset last crossReference source by @PrinsFrank in #161
- Accept space after start crossReference source by @PrinsFrank in #162
- Accept reference value for MediaBox and CropBox by @PrinsFrank in #163
- Scale and font size can both be negative by @PrinsFrank in #165
- Increase logging for incorrect matrix transformation by @PrinsFrank in #166
- Add support for padding and extra whitespace in reference array values by @PrinsFrank in #167
- Add support for TIFF images (CCITTFaxDecode) by @PrinsFrank in #169
- DecodeParams cannot be plain array by @PrinsFrank in #170
- Accept subdictionary for decodeParams by @PrinsFrank in #171
- Dictionary arrays can contain null by @PrinsFrank in #173
- Properly support array filter types when retrieving image content by @PrinsFrank in #174
- Properly handle newlines in CIDFonts widths by @PrinsFrank in #175
- Add support for JBIG2 images by @PrinsFrank in #176
- Add FillSignData as valid TypeNameValue by @PrinsFrank in #182
- Accept scale values that are floats besides ints by @PrinsFrank in #184
- Recover from invalid byte offset last cross reference section by looking for stream/table markers at the end of the document instead by @PrinsFrank in #186
- Add support for PNG predictor algorithm None by @PrinsFrank in #187
- Add support for CMYK rasterized images by @PrinsFrank in #189
- Handle spaces between components of matrix transformation in graphics state operator by @PrinsFrank in #192
- CONTENTS can be a ReferenceValueArray, not a simple ArrayValue by @PrinsFrank in #193
- Don't parse characters in resource names as operators by @PrinsFrank in #195
- Array values can be multiple resource names not seperated by space by @PrinsFrank in #196
Full Changelog: v2.2.1...v2.3.0
v2.2.1 Several small parsing related bugfixes, add JPEG2000 support
What's Changed
- Simplify working with underlying positioned text elements by moving retrieval of fonts to PositionedTextElement and presorting positionedTextElements by @PrinsFrank in #136
- Properly handle newline between dictionary key and value by @PrinsFrank in #141
- Use the length from a crossReferenceStream when it is available by @PrinsFrank in #143
- Don't try to decode values with JPX_DECODE filter by @PrinsFrank in #145
Full Changelog: v2.2.0...v2.2.1
v2.2.0 Add support for extraction of Rasterized Images
What's Changed
- Delete stale workflow to prevent issues from being marked as stale by @PrinsFrank in #130
- Implement rasterized image extraction by @PrinsFrank in #132
- Add support for other colorspaces that are not DeviceColorSpaces but are stored as luts in objects by @PrinsFrank in #134
Full Changelog: v2.1.1...v2.2.0
v2.1.1 Fixes bug when parsing PDFs where trailer marker is not followed by newline
What's Changed
- Don't assume newline after trailer marker, by @PrinsFrank in #128
Full Changelog: v2.1.0...v2.1.1
v2.1.0 Official PDF2.0 support, classes now non final to allow extension and mocking
What's Changed
- Make classes non-final to open them for extension and mocking by @PrinsFrank in #124
- Add official support for pdf 2.0 by @PrinsFrank in #125
Full Changelog: v2.0.0...v2.1.0
v2.0.0 Multiline/positional text extraction, image extraction, many bugfixes
What's Changed
- More strictly parse dictionary arrays by @PrinsFrank in #53
- Add new PDF2.0 tabs nameValues by @PrinsFrank in #54
- Fix issue where nested arrays in dictionaries were closed too early by @PrinsFrank in #56
- Fix comment state in dictionary parsing by @PrinsFrank in #58
- Fix nesting issues in text operator parsing by @PrinsFrank in #60
- Correctly parse dictionary array values where items are seperated over several lines by @PrinsFrank in #62
- Add missing public key security handlers to FilterNameValue by @PrinsFrank in #65
- Add support for multiple codespaceranges by @PrinsFrank in #67
- Remove extra decoding in uncompressed object by @PrinsFrank in #68
- Allow dashes in font names by @PrinsFrank in #70
- Fix issues where font operator contains font names by @PrinsFrank in #72
- Rename TextObjects and collection to contentStream to better reflect expected content by @PrinsFrank in #73
- Tc operator sets char space not char size by @PrinsFrank in #74
- Keep track of content stream commands outside text objects by @PrinsFrank in #75
- Remove dependency to internal phpunit method and ignore non-testing tests by @PrinsFrank in #76
- Organize content stream classes by @PrinsFrank in #77
- Organize text operators by @PrinsFrank in #78
- Parse textObjects to intermediate PositionedTextElement by @PrinsFrank in #79
- Implement remaining content stream operators by @PrinsFrank in #81
- Fix false positives on content stream commands by @PrinsFrank in #82
- Implement a transformation state stack to fix text lines appearing out of order by @PrinsFrank in #83
- Fix a typo in CONTRIBUTING by @szepeviktor in #84
- Fix content stream unit test after matrix multiplication fix resulting in correct x and y offsets by @PrinsFrank in #85
- Add test for compressed byte offsets by @PrinsFrank in #87
- Parse CIDFontWidths in dictionary values by @PrinsFrank in #89
- Implement space insertion based on text width by @PrinsFrank in #90
- Update samples dependency and increase space insertion threshold by @PrinsFrank in #91
- Don't try to parse EMCs and other content stream content as dictionaries by @PrinsFrank in #92
- Don't continue matching operators when in dictionary key or escaped string by @PrinsFrank in #94
- Implement encryption detection by @PrinsFrank in #97
- Use object length instead of searching for end marker when set in dictionary by @PrinsFrank in #95
- Add missing tests for inMemoryStream by @PrinsFrank in #98
- Implement missing features for literal string escape sequences by @PrinsFrank in #104
- Implement toUnicodeCMap unicode mappings for string literals by @PrinsFrank in #106
- Fix issue where string literals ended up as part of multibyte character groups by @PrinsFrank in #110
- Fix issue when text state is set outside of text object by @PrinsFrank in #112
- When in a string literal, don't keep track of other nesting levels as that is not possible, fixes several array delimiter issues in string literals by @PrinsFrank in #113
- Add missing resource TypeNameValue by @PrinsFrank in #114
- Fix: mb_convert_encoding can output false by @PrinsFrank in #115
- Implement image extraction by @PrinsFrank in #118
New Contributors
- @szepeviktor made their first contribution in #84
Full Changelog: v1.1.0...v2.0.0
v2.0.0 Alpha 5
What's Changed
- Implement image extraction by @PrinsFrank in #118
Full Changelog: v2.0.0-alpha.4...v2.0.0-alpha.5
v2.0.0 Alpha 4
What's Changed
- Fix issue where string literals ended up as part of multibyte character groups by @PrinsFrank in #110
- Fix issue when text state is set outside of text object by @PrinsFrank in #112
- When in a string literal, don't keep track of other nesting levels as that is not possible, fixes several array delimiter issues in string literals by @PrinsFrank in #113
- Add missing resource TypeNameValue by @PrinsFrank in #114
- Fix: mb_convert_encoding can output false by @PrinsFrank in #115
Full Changelog: v2.0.0-alpha.3...v2.0.0-alpha.4
v2.0.0 Alpha 3
What's Changed
- Implement toUnicodeCMap unicode mappings for string literals by @PrinsFrank in #106
Full Changelog: v2.0.0-alpha.2...v2.0.0-alpha.3