Skip to content

Conversation

@veerababu1729
Copy link

@veerababu1729 veerababu1729 commented Oct 23, 2025

Problem

Hebrew text "אישור אגודה לחתימת" renders differently across PDF pages:

  • ✅ Correct on pages with more Hebrew
  • ❌ Reversed on pages with mostly English

Solution

Enhanced bidi.js algorithm to detect pure Hebrew content and handle it consistently.

Changes

  • src/core/bidi.js: Improved base direction detection (lines 167-209)
  • test/unit/bidi_spec.js: Added Hebrew consistency tests (lines 72-106)

Results

Fixes #20336

  • ✅ Hebrew text now consistent across all contexts
  • ✅ All existing tests still pass
  • ✅ Fixes search/copy-paste issues with Hebrew PDFs

Test Evidence

// Before: Inconsistent
"Document אישור אגודה לחתימת file"  RTL 
"Long English text אישור אגודה לחתימת more"  LTR (Hebrew reversed) 

// After: Consistent  
"Document אישור אגודה לחתימת file"  RTL 
"Long English text אישור אגודה לחתימת more"  LTR (Hebrew correct) 


**Key Points:**
- 🎯 **Clear problem**: Hebrew text inconsistency
- 🔧 **Simple solution**: Better detection algorithm  
- 📊 **Proof**: Before/after examples
-  **Safe**: No breaking changes
- 🧪 **Tested**: New tests + existing tests pass

@veerababu1729
Copy link
Author

Please let me know anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Text extraction / text layer rendering for hebrew content

2 participants