-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Labels
Description
🔎 Search before asking
- I have searched the PaddleOCR Docs and found no similar bug report.
- I have searched the PaddleOCR Issues and found no similar bug report.
- I have searched the PaddleOCR Discussions and found no similar bug report.
PR fix
Can close this issue after merging PR #15204
🐛 Bug (问题描述)
PaddleOCR is able to detect lowercase Vietnamese characters but cannot detect uppercase ones as ppocr/utils/dict/vi_dict.txt does not have accented uppercase characters such as Á, À, Ả, etc.
The PR at ccb2ecb contains 196 characters and may improve PaddleOCR's ability to detect more Vietnamese words.
Example 1) Paddle detects SOẠT as SOAT (missing dot under A) which is wrong as the accent in Vietnamese changes the meaning of the word.
Example 2) Paddle detects BẮT as BAT
For lower case words such as điều, Paddle performs much better.
🏃♂️ Environment (运行环境)
N/A
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
N/A