Replies: 2 comments
-
目前只支持常用汉字,对于生僻字建议可以使用相关数据微调 |
Beta Was this translation helpful? Give feedback.
-
PaddleOCR 在识别生僻字(如「滂」)时可能出现识别不到的情况,主要原因可能有以下几点:
🧩 解决办法建议:
🔗 相关问题参考: 该 Issue 中也提到了一些生僻字如「嫖」、「娼」等,即便在字符集中,也无法识别,可能原因包括模型限制和训练数据不足。 📌 总结: Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🔎 Search before asking
🐛 Bug (问题描述)
使用paddleocr识别图片文字,识别不到生僻字,比如滂
🏃♂️ Environment (运行环境)
paddlepaddle==2.5.2
paddleocr==2.7.3
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
def get_ocr_res(image):
det_model_dir = '/tmp/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer'
rec_model_dir = '/tmp/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer'
cls_model_dir = '/tmp/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer'
ocr = PaddleOCR(det_model_dir=det_model_dir, rec_model_dir=rec_model_dir,
cls_model_dir=cls_model_dir, use_angle_cls=True, use_gpu=False)
results = ocr.ocr(image)
trans_res = []
for idx in range(len(results)):
res = results[idx]
if res is not None:
for line in res:
trans_res.append((line[1][0], line[0]))
return trans_res
get_ocr_res(image)
Beta Was this translation helpful? Give feedback.
All reactions