PaddleOCR fails to correctly detect German diacritics (ä, ö, ü, ß) in text recognition #16427
-
🔎 Search before asking
🐛 Bug (问题描述)I'm using PaddleOCR to detect German text from images, but I've noticed that it consistently fails to correctly recognize German diacritics. For example: "Spaß" is detected as "SpafS" or "SpaR" "Zähne" is detected as "Zahne" "frühstücken" is detected as "fruhstucken" "Frühstück" is detected as "Fruhstuck" "nächsten" is detected as "nachsten" My current setup: Current output: Florian steht jeden Tag um sechs Uhr auf. Zuerst wascht er sein Gesicht und putzt sich die Zahne. Dann geht er nach unten, um zu fruhstücken. Nach dem Fruhstuck zieht er sich an und geht zur Schule. I've already tried: Using lang='german' parameter Using different image resolutions Pre-processing the images to improve contrast Is there a way to improve the recognition of German special characters with PaddleOCR? Do I need to fine-tune the model with a specialized German dataset? Are there any specific parameters or pre-processing techniques that might help? PaddleOCR version: 2.9.1 I tried both latin and german model but both not correctly detect the diacritics. I tried to finetune the latin model with more umlaut examples.i created 5000 syntetic images and 100 real image. After train with this yaml i saw it overfit even i reached accuracy 98%. Click to expandGlobal:
debug: false
use_gpu: false
epoch_num: 10
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/v3_latin_mobile
save_epoch_step: 3
eval_batch_step: [0, 150]
cal_metric_during_train: true
pretrained_model: ./pretrain_models/latin_PP-OCRv3_rec_train/best_accuracy
checkpoints:
save_inference_dir: ./output/v3_latin_mobile/inference
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/dict/latin_dict.txt
max_text_length: &max_text_length 50
infer_mode: false
use_space_char: true
distributed: false
save_res_path: ./output/rec/predicts_ppocrv3_latin.txt
freeze_params:
- "backbone"
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 3
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
ext_op_transform_idx: 1
label_file_list:
- ./train_data/train_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data
label_file_list:
- ./train_data/val_list.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 8
🏃♂️ Environment (运行环境)Hardware:
Model: Mac mini (Mac16,10)
Processor: Apple Silicon (M-series)
Memory: 24 GB
Software:
OS: macOS 15.3.1 (darwin 24.3.0)
Python: 3.12.9
Shell: fish (/opt/homebrew/bin/fish)
Dependencies:
paddleocr: 2.10.0
paddlepaddle: 3.0.0b0 (CPU version)
opencv-python: 4.6.0.66
opencv-contrib-python: 4.11.0.86
🌰 Minimal Reproducible Example (最小可复现问题的Demo)def process_image_ocr(image):
"""
Process an image through OCR and return the results.
Args:
image: numpy array of the image
Returns:
results: list of OCR results
"""
enhanced = enhance_image(image)
ocr = PaddleOCR(use_angle_cls=True, lang='latin', show_log=False)
results = ocr.ocr(enhanced, cls=True)
return results[0] if results else [] |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
I tested your image with paddleOCR version 3.0.0 and backend paddlepaddle 3.0, using PP-OCRv5_server_rec model, which is larger and more accurate than the default mobile model. Same problem exists. Some German letters do not exist in the model. Anyone able to train a OCRv5 model, compatible with Version 3.0, with German characters would be much appreciated. Otherwise the Version 3.0 engine is near perfect. See my post and sample image at #15414 (comment) |
Beta Was this translation helpful? Give feedback.
-
I have posted German testing data, with the hopes that someone that knows how to train the models can use it for German characters. See #15457 OCR Engine is frighteningly accurate for English. It would be a shame not to do other languages with the latest models. |
Beta Was this translation helpful? Give feedback.
-
Models PP-OCRv5 and v4 don't support chars outside english. |
Beta Was this translation helpful? Give feedback.
-
Please look at the last message and sample in #15457 . German letters are supported in PP-OCRv5 server model. But it requires a certain font and DPI. Otherwise German letters are easily missed. |
Beta Was this translation helpful? Give feedback.
-
@sayinmehmet47 PP-OCRv5 now supports Multilingual Text Recognition Model, which supports the training and inference process for text recognition models in 37 languages, including French, Spanish, Portuguese, Russian...Details |
Beta Was this translation helpful? Give feedback.
-
Thank you @leo-q8 ! |
Beta Was this translation helpful? Give feedback.
-
@leo-q8 can we have a 'Server' grade high quality latin inference model as well ? |
Beta Was this translation helpful? Give feedback.
-
Thank you for your attention!The precision performance of the mobile version is quite good. Feel free to use it and raise any issues if you encounter a bad case. We are also planning to develop |
Beta Was this translation helpful? Give feedback.
@sayinmehmet47 PP-OCRv5 now supports Multilingual Text Recognition Model, which supports the training and inference process for text recognition models in 37 languages, including French, Spanish, Portuguese, Russian...Details