-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Open
Description
🔎 Search before asking
- I have searched the PaddleOCR Docs and found no similar bug report.
- I have searched the PaddleOCR Issues and found no similar bug report.
- I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
The Romanian language is not fully supported, as diacritics (ăâîșț) are not properly handled. This makes the resulting text unusable since most words are missing characters.
🏃♂️ Environment (运行环境)
OS: Ubuntu 24.04.1 LTS
Environment: conda
Python: Python 3.11.5
Install: pip
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
pip install paddleocr
RAM: 128GB
CPU: AMD Ryzen Threadripper PRO 5965WX 24-Cores
CUDA: None
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
For the image below, run PaddleOCR using:
paddleocr ocr -i monitorul_oficial_sample.png --lang ro --use_doc_orientation_classify False --use_doc_unwarping False --use_textline_orientation False --save_path ./output

The resulting text has many words that are incorrectly extracted due to missing characters (judector
vs judecător
). A snippet of the resulting JSON:
"— judector",
"neconstituionalitate a dispoziiilor art. 327 lit. b) i ale",
"Gheorghe Stan",
"— judector",
"art. 328 alin. (3) din Codul de procedur penal, excepie",
"Livia Doina Stanciu",
"— judector",
"ridicat de Gheorghe Cureleac într-o cauz penalà în care",
"Elena-Simina Tnsescu",
"— judector",
"autorul excepiei a fost trimis în judecatà pentru svârirea unor",
"Varga Attila",
"— judector",
"infractiuni.",
"5. În motivarea excepiei de neconstituionalitate, autorul",
Metadata
Metadata
Assignees
Labels
No labels