商榷识别为商榨 #15040
-
🔎 Search before asking
🐛 Bug (问题描述)这个环境也可以复现: 🏃♂️ Environment (运行环境)
🌰 Minimal Reproducible Example (最小可复现问题的Demo)from paddleocr import PaddleOCR, draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
res = result[idx]
for line in res:
print(line)
# draw result
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg') |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
用的哪个版本?是paddleocrv4吗 |
Beta Was this translation helpful? Give feedback.
-
是的,版本信息: paddleocr==2.10.0 我用的官方默认示例进行测试的,这是启动时候的调试信息
|
Beta Was this translation helpful? Give feedback.
-
那就是相关数据不够,需要一些相关的数据进行微调训练 |
Beta Was this translation helpful? Give feedback.
-
你遇到的问题是 OCR 模型将“商榷”错误识别为“商榨”,这是 PaddleOCR 的文字识别阶段存在字符形近误差的一例。 根据你提供的截图以及代码环境:
📌 原因分析:
📘 解决方案建议:
🔗 如果确认模型无法精确区分一些形近字,也可以考虑在字符级上做二次校正的后处理步骤,本质上是加入语言模型(类似拼写检查)的方式。 参考文档:
如需要精调模型识别更罕见或容易混淆的字形,可考虑训练自定义识别模型,具体步骤见官方文档“模型训练与微调”。 Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
那就是相关数据不够,需要一些相关的数据进行微调训练