Skip to content

test: add 422+ unit tests across 19 modules#261

Open
jacob-cotten wants to merge 1 commit intodeveloper0hye:mainfrom
jacob-cotten:contrib/tests-gift
Open

test: add 422+ unit tests across 19 modules#261
jacob-cotten wants to merge 1 commit intodeveloper0hye:mainfrom
jacob-cotten:contrib/tests-gift

Conversation

@jacob-cotten
Copy link

Summary

Thank you for building pdfplumber-rs — it's an impressive pure-Rust port and we've been using it extensively. We wanted to give back by contributing comprehensive test coverage.

This PR adds 422+ unit tests across 19 modules in pdfplumber-core and pdfplumber-parse, covering edge cases, boundary conditions, and correctness invariants:

  • encoding.rs — glyph name resolution, StandardEncoding boundaries, FontEncoding, EncodingResolver 3-tier logic
  • edges.rs — edge generation, orientation, degenerate paths
  • search.rs — regex patterns, unicode, bbox union, anchored patterns
  • dedupe.rs — tolerance boundaries, font/size blocking, output ordering
  • bidi.rs — RTL/LTR detection, neutral chars, field preservation
  • shapes.rs — orientation classification, flip_y, rect construction, CTM transforms
  • annotation.rs — subtype roundtrip, equality semantics, bbox preservation
  • struct_tree.rs — deep nesting, MCID extraction, child ordering
  • page_regions.rs — unicode masking, thresholds, custom margins
  • path.rs — builder lifecycle, rectangle segments, clone independence
  • html.rs — heading levels, HTML escaping, bold/italic detection, list rendering
  • color_space.rs — all color spaces, ICC delegation, indexed bounds
  • standard_fonts.rs — all 14 standard fonts, monospace invariant, bbox sanity
  • words.rs — word split boundaries, vertical text ordering, tolerance semantics, CJK
  • text.rs, layout.rs, table.rs, cmap.rs — additional coverage

Bug fix included

Also fixes the word-split tolerance check from > to >= to match Python pdfplumber's semantics, and corrects 3 pre-existing test expectations for vertical text ordering.

Test plan

  • All 1,804 tests pass (cargo test -p pdfplumber-core -p pdfplumber-parse --lib)
  • Zero ignored tests
  • No functional changes beyond the tolerance boundary fix

Thank you again for your work on this project. 🙏

🤖 Generated with Claude Code

Adds comprehensive test coverage for pdfplumber-core and pdfplumber-parse:

- encoding.rs: 48 tests (glyph name resolution, StandardEncoding, FontEncoding, EncodingResolver)
- edges.rs: 33 tests (edge generation, orientation, degenerate paths)
- search.rs: 12 tests (regex, unicode, bbox union, anchored patterns)
- dedupe.rs: 14 tests (tolerance boundaries, font/size blocking, output ordering)
- bidi.rs: 14 tests (RTL/LTR detection, neutral chars, field preservation)
- shapes.rs: 18 tests (orientation, flip_y, rect construction, CTM transforms)
- annotation.rs: 11 tests (subtype roundtrip, equality, bbox preservation)
- struct_tree.rs: 9 tests (deep nesting, MCID extraction, child ordering)
- page_regions.rs: 10 tests (unicode masking, thresholds, custom margins)
- path.rs: 12 tests (builder reset, rectangle segments, clone independence)
- html.rs: 35 tests (headings, escaping, bold/italic detection, lists, median)
- color_space.rs: 23 tests (all color spaces, ICC delegation, indexed bounds)
- standard_fonts.rs: 14 tests (all 14 fonts, monospace invariant, bbox sanity)
- words.rs: 90+ tests (word split boundaries, vertical text, tolerance, CJK)
- text.rs: tests for CTM transforms and CJK detection
- layout.rs, table.rs, cmap.rs: additional coverage

Also fixes word-split tolerance check to use >= (matching Python pdfplumber
semantics) and corrects 3 test expectations for vertical text ordering
(non-upright chars sort x0 descending, all vertical chars get Ttb direction).

Signed-off-by: Jacob Cotten <jacob@stratesystems.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant