Skip to content

v2.0.0 Multiline/positional text extraction, image extraction, many bugfixes

Choose a tag to compare

@PrinsFrank PrinsFrank released this 19 May 18:22
· 100 commits to main since this release
921efdd

What's Changed

  • More strictly parse dictionary arrays by @PrinsFrank in #53
  • Add new PDF2.0 tabs nameValues by @PrinsFrank in #54
  • Fix issue where nested arrays in dictionaries were closed too early by @PrinsFrank in #56
  • Fix comment state in dictionary parsing by @PrinsFrank in #58
  • Fix nesting issues in text operator parsing by @PrinsFrank in #60
  • Correctly parse dictionary array values where items are seperated over several lines by @PrinsFrank in #62
  • Add missing public key security handlers to FilterNameValue by @PrinsFrank in #65
  • Add support for multiple codespaceranges by @PrinsFrank in #67
  • Remove extra decoding in uncompressed object by @PrinsFrank in #68
  • Allow dashes in font names by @PrinsFrank in #70
  • Fix issues where font operator contains font names by @PrinsFrank in #72
  • Rename TextObjects and collection to contentStream to better reflect expected content by @PrinsFrank in #73
  • Tc operator sets char space not char size by @PrinsFrank in #74
  • Keep track of content stream commands outside text objects by @PrinsFrank in #75
  • Remove dependency to internal phpunit method and ignore non-testing tests by @PrinsFrank in #76
  • Organize content stream classes by @PrinsFrank in #77
  • Organize text operators by @PrinsFrank in #78
  • Parse textObjects to intermediate PositionedTextElement by @PrinsFrank in #79
  • Implement remaining content stream operators by @PrinsFrank in #81
  • Fix false positives on content stream commands by @PrinsFrank in #82
  • Implement a transformation state stack to fix text lines appearing out of order by @PrinsFrank in #83
  • Fix a typo in CONTRIBUTING by @szepeviktor in #84
  • Fix content stream unit test after matrix multiplication fix resulting in correct x and y offsets by @PrinsFrank in #85
  • Add test for compressed byte offsets by @PrinsFrank in #87
  • Parse CIDFontWidths in dictionary values by @PrinsFrank in #89
  • Implement space insertion based on text width by @PrinsFrank in #90
  • Update samples dependency and increase space insertion threshold by @PrinsFrank in #91
  • Don't try to parse EMCs and other content stream content as dictionaries by @PrinsFrank in #92
  • Don't continue matching operators when in dictionary key or escaped string by @PrinsFrank in #94
  • Implement encryption detection by @PrinsFrank in #97
  • Use object length instead of searching for end marker when set in dictionary by @PrinsFrank in #95
  • Add missing tests for inMemoryStream by @PrinsFrank in #98
  • Implement missing features for literal string escape sequences by @PrinsFrank in #104
  • Implement toUnicodeCMap unicode mappings for string literals by @PrinsFrank in #106
  • Fix issue where string literals ended up as part of multibyte character groups by @PrinsFrank in #110
  • Fix issue when text state is set outside of text object by @PrinsFrank in #112
  • When in a string literal, don't keep track of other nesting levels as that is not possible, fixes several array delimiter issues in string literals by @PrinsFrank in #113
  • Add missing resource TypeNameValue by @PrinsFrank in #114
  • Fix: mb_convert_encoding can output false by @PrinsFrank in #115
  • Implement image extraction by @PrinsFrank in #118

New Contributors

Full Changelog: v1.1.0...v2.0.0