Summary
hello_structure.pdf is still a severe gap and remains ignored.
Evidence
Run:
cargo test -p pdfplumber --test cross_validation -- --include-ignored --nocapture
Current result:
hello_structure.pdf: chars 38.9%, words 44.4%
Ignored reason in test file already points to tagged PDF + TrueType handling gap.
Scope
- Investigate tagged PDF extraction path for this fixture
- Close TrueType/encoding mapping gap causing low char/word recovery
Acceptance Criteria
- Fixture reaches >=95% chars and >=95% words
- Convert from
cross_validate_ignored! to asserting cross_validate!