-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Attach (recommended) or Link to PDF file
Web browser and its version
Chrome 139, Firefox 131
Operating system and its version
Windows 11, macOS 15, Ubuntu 22.04
PDF.js version
v5.3.31.a1
Is the bug present in the latest PDF.js version?
Yes
Is a browser extension
No
Steps to reproduce the problem
-
Open the attached PDF (contains Hebrew content).
-
Inspect the text layer using the browser developer tools (.textLayer spans).
-
On page 1, note that the phrase visually rendered as אישור אגודה לחתימת is extracted in reverse order in the text layer while performing search (e.g., לחתימת אישור אגודה).
-
Try searching in the PDF viewer for חוזה חכירה
On page 1 → "No results found".
-
Searching for חכירה חוזה yields result in page 2 where as the rendered text in page 2 is חוזה חכירה.
What is the expected behavior?
The text layer should consistently preserve the correct order of Hebrew text across all pages.
Search should work reliably on all pages for Hebrew text.
What went wrong?
The text layer for Hebrew content is inconsistent with the visual rendering. While the text displays correctly on the canvas, the extracted text in the text layer is sometimes reversed or altered. This causes search, copy-paste and text extraction features to fail on certain pages
Link to a viewer
No response
Additional context
No response