Paddleocr Finetuning pretrained model with custom data doesnt work as expected #14965

LeonGitafk · 2025-03-31T14:46:08Z

LeonGitafk
Mar 31, 2025

Hello,
iam trying to get the best possible result of text recognition for my specific kind of pictures. I researched in the paddleocr documentation for recognition training and found out that I need to have 100.000+ pictures for my own custom model. I cant do this at this moment tho because I only have acces to 1100 pictures right now. Then I found out about Finetuning a pretrained model like the "en_PP-OCRv3_rec", which needs 5000+ images to get good results, but I thought with 1100 pictures I get a first impression if it works and if I get little changes in the resultating recognition. So I prepared all the data in a Folder called FineTuningOCR like this:

C:.
│   best_accuracy.pdparams
│   en_dict.txt
│
├───configs
│   └───rec
│           FineTuning.yml
│
├───result
└───trainData
    │   train.txt
    │
    └───images
             image.jpg
             image2.jpg
             image3.jpg
             ...
             image1100.jpg

best_accuracy.pdparams ist the pretrained model. en_dict.txt is the dictionary containing all the symbols. In the trainData Folder the images are located and the train.txt that is in the following scheme: "[pathToPicture/image.jpg][TAB][symbols]" example: "C:/Users/Max/Documents/FineTuning/trainData/images/image.jpg D-FWFG".
My FineTuning.yml file that guides the training, looks like this:

Global:
  use_gpu: True
  epoch_num: 50
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: C:/Users/Name/Documents/FineTuningOCR/result/model/
  save_epoch_step: 3
  eval_batch_step: [0, 500]  # Öfter evaluieren für schnelleres Debugging
  cal_metric_during_train: True
  pretrained_model: C:/Users/Name/Documents/FineTuningOCR/best_accuracy.pdparams
  checkpoints:
  save_inference_dir: ./
  use_visualdl: False
  character_dict_path: C:/Users/Name/Documents/FineTuningOCR/en_dict.txt
  max_text_length: 25
  infer_mode: False
  use_space_char: True
  save_res_path: C:/Users/Name/Documents/FineTuningOCR/result/predicts3.txt
  
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Piecewise
    learning_rate: 0.0001  # Höhere Start-LR für stabileres Training
    warmup_epoch: 2
    decay_epochs: [10, 25, 40]
    values: [0.0001, 0.00005, 0.00001, 0.000005]
  regularizer:
    name: 'L2'
    factor: 0

Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: large
  Neck:
    name: SequenceEncoder
    encoder_type: svtr
    hidden_size: 96
  Head:
    name: CTCHead
    fc_decay: 0

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: C:/Users/Name/Documents/FineTuningOCR/result/
    label_file_list: ["C:/Users/Name/Documents/FineTuningOCR/trainData/train.txt"]
    transforms:
      - DecodeImage:
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode:
      - RecResizeImg:
          image_shape: [3, 32, 100]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length']
  loader:
    shuffle: True
    batch_size_per_card: 8  # Höher für stabileres Training (falls GPU erlaubt)
    drop_last: False
    num_workers: 4
    use_shared_memory: True  # Falls es RAM-Probleme gibt, auf False setzen

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: C:/Users/Name/Documents/FineTuningOCR/trainData/images/
    label_file_list: ["C:/Users/Name/Documents/FineTuningOCR/trainData/train.txt"]
    transforms:
      - DecodeImage:
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode:
      - RecResizeImg:
          image_shape: [3, 32, 100]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length']
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 32
    num_workers: 4
    use_shared_memory: True

My Hardware is a Nvidia Gtx1650 4GB VRAM GPU and I have 32GB RAM.

I tried to train it and it seemed that everything worked fine.
Training command in cmd:

cd C:/Users/Name/PaddleOCR
python tools/train.py -c C:/Users/Name/Documents/FineTuningOCR/configs/rec/FineTune.yml

I have received a best_model folder in my result/model/ folder which contained "model.pdopt" and "model.pdparams" This filetype couldnt be used as the model so I used the following command to convert:

python tools/export_model.py -c C:\Users\Name\Documents\FineTuningOCR\configs\rec\FineTune.yml -o Global.pretrained_model=C:\Users\Name\Documents\FineTuningOCR\best_accuracy.pdparams  Global.save_inference_dir=C:\Users\Name\Documents\FineTuningOCR\FinalModel

And got these files: inference.pdiparams, inference.pdiparams.info, inference.pdmodel, inference.yml
This is the content of Inference.yml (I find it suspicious that some letters are not in quotation marks):

PreProcess:
  transform_ops:
  - DecodeImage:
      channel_first: false
      img_mode: BGR
  - CTCLabelEncode: null
  - RecResizeImg:
      image_shape:
      - 3
      - 32
      - 100
  - KeepKeys:
      keep_keys:
      - image
      - label
      - length
PostProcess:
  name: CTCLabelDecode
  character_dict:
  - '0'
  - '1'
  - '2'
  - '3'
  - '4'
  - '5'
  - '6'
  - '7'
  - '8'
  - '9'
  - ':'
  - ;
  - <
  - '='
  - '>'
  - '?'
  - '@'
  - A
  - B
  - C
  - D
  - E
  - F
  - G
  - H
  - I
  - J
  - K
  - L
  - M
  - N
  - O
  - P
  - Q
  - R
  - S
  - T
  - U
  - V
  - W
  - X
  - Y
  - Z
  - '['
  - \
  - ']'
  - ^
  - _
  - '`'
  - a
  - b
  - c
  - d
  - e
  - f
  - g
  - h
  - i
  - j
  - k
  - l
  - m
  - n
  - o
  - p
  - q
  - r
  - s
  - t
  - u
  - v
  - w
  - x
  - y
  - z
  - '{'
  - '|'
  - '}'
  - '~'
  - '!'
  - '"'
  - '#'
  - $
  - '%'
  - '&'
  - ''''
  - (
  - )
  - '*'
  - +
  - ','
  - '-'
  - .
  - /
  - ' '

Then I tried this Code:

import os
from paddleocr import PaddleOCR

def test_ocr(image_path, model_dir, dict_path):
    ocr_model = PaddleOCR(
        det=False,
        show_log=True,
        use_angle_cls=True,
        det_db_box_thresh=0.6,
        det_db_unclip_ratio=1.8,
        rec_model_dir=model_dir,
        rec_char_dict_path=dict_path
    )

    result = ocr_model.ocr(image_path, cls=True)

    print("Text:")
    for line in result[0]:
        print(f"Text: {line[1][0]} (Confidence: {line[1][1]})")

image_path = "./image.jpg"
model_dir = "./FinalModel"
dict_path = "./en_dict.txt"

test_ocr(image_path, model_dir, dict_path)

with the pretrained inference model alone I have a result of the Text shown in the image, while the new Fine tuned model recognizes nothing. I dont understand why. what did I do wrong. I thought it must perform at least as well as the pretrained model.
I also tried more other images and for every picture I tried it had no result. Even the same images I used for the training had no result in recognition. So I think there is a issue in my training process.

ty in advance for dealing with my problem and maybe helping me.

GreatV · 2025-03-31T14:47:59Z

GreatV
Mar 31, 2025
Maintainer

Basierend auf deiner ausführlichen Beschreibung und dem bereitgestellten PaddleOCR-Konfigurations- und Trainingsprozess gibt es mehrere mögliche Ursachen dafür, dass dein finetuned Modell keine sinnvollen Ergebnisse liefert, obwohl das ursprüngliche Pretrained Model gut funktionierte. Hier sind die häufigsten Probleme und entsprechende Lösungsansätze:

❗ Problem: Pretrained Model wird nicht korrekt geladen
Du hast angegeben, dass du die Datei best_accuracy.pdparams aus dem Pretrained Model en_PP-OCRv3_rec verwendet hast, jedoch kann PaddleOCR in manchen Fällen die Parameter nicht korrekt laden, wenn:
- die Architektur in der YAML-Datei nicht exakt dem Pretrained-Modell entspricht,
- du eine inkompatible Architektur konfiguriert hast (z. B. CRNN vs. SVTR beim PP-OCRv3).
→ Du gibst an, dass du CRNN verwendest in deiner FineTune.yml:

yaml
algorithm: CRNN

Aber das Modell en_PP-OCRv3_rec basiert auf SVTR_LCNet (nicht CRNN).

→ Lösung: Nutze die gleiche Architektur wie im Pretrained Modell! Für en_PP-OCRv3_rec solltest du in der YAML folgendes setzen:

yaml
algorithm: SVTR_LCNet
Transform: null
Backbone:
name: MobileNetV1Enhance
scale: 0.5
Neck:
name: SVTR
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
name: CTCHead
fc_decay: 0.00001

Wenn du eine andere Architektur verwendest (hier CRNN), können die Layer-Parameter aus dem Pretrained Model nicht korrekt geladen werden, und dein Modell trainiert quasi bei null – wie du beschreibst (kein Output, schlechte/no Erkennung).
🔣 Problem: Dictionary-Unterschiede
Es ist essenziell, dass das Dictionary (en_dict.txt) dieselben Zeichen in derselben Reihenfolge enthält wie das, mit dem das Pretrained Modell trainiert wurde. Wenn du ein eigenes Dictionary nutzt, kann dies ebenfalls dazu führen, dass das pretrained_model nicht korrekt initialisiert wird.

→ Lösung:
- Stelle sicher, dass dein en_dict.txt exakt der des originalen Models en_PP-OCRv3_rec entspricht, wenn du darauf finetunen willst.
- Wenn du eigene Sonderzeichen hinzufügst (was okay ist), musst du sicherstellen, dass das zusätzliche Zeichen nur angehängt werden (nicht deren Reihenfolge ändern oder bestehende löschen) – andernfalls stimmt das Gewicht-Mapping nicht mehr.
🔄 Problem: Setzung von pretrained_model vs. checkpoints in der YAML
Du verwendest:

yaml
Global:
pretrained_model: C:/.../best_accuracy.pdparams
checkpoints:

→ Das ist korrekt, ABER nur wenn du sicherstellst, dass checkpoints leer ist. Wenn “checkpoints” z. B. versehentlich auf einen anderen Pfad verweist, überschreibt PaddleOCR deine pretrained_model-Einstellung.

Außerdem: Du hast best_accuracy.pdparams direkt genutzt. PaddleOCR erwartet die pdparams-Datei normalerweise im Format:

C:/.../pretrained_model/
├── model.pdparams ✅

Nicht direkt pdparams ohne Modellstruktur.

→ Lösung:
- Speichere die Datei als model.pdparams in einem separaten Ordner, z. B.:
  
  C:/Users/Name/Documents/FineTuningOCR/pretrained_model/model.pdparams
  
  und dann in der YAML:
  
  Global:
  pretrained_model: C:/Users/Name/Documents/FineTuningOCR/pretrained_model/
🧪 Problem beim Inferenzen-Modell (inference model)
Auch wenn das Training korrekt läuft, tritt bei paddleocr.ocr() gerne folgendes Problem auf: Du gibst zusätzlich rec_char_dict_path=... an — dieser Parameter überschreibt ggf. die Zeichenliste, die beim Export an das Inferenzmodell gekoppelt wurde.

→ Lösung:
- Entweder: Überprüfe, ob en_dict.txt wirklich exakt demselben Dictionary entspricht, mit dem inference.pdmodel exportiert wurde.
- Oder: Entferne rec_char_dict_path beim Aufruf von PaddleOCR(...) und verlasse dich rein auf das im exportierten inference.yml eingebundene Dictionary.
  
  Beispiel:
  ocr_model = PaddleOCR(det=False, use_angle_cls=True, rec_model_dir=model_dir)
🧪 Eval Datensatz-Konfiguration
In deiner YAML nutzt du für Eval:

yaml
Eval:
dataset:
data_dir: C:/.../trainData/images/
label_file_list: ["C:/.../trainData/train.txt"]

Das ist inkonsistent. Die data_dir sollte wie beim Training auf das übergeordnete Verzeichnis (C:/.../trainData/) zeigen, nicht das image-Verzeichnis. Sonst wird beim Laden der Texte kein gültiger Pfad zusammengesetzt.

→ Lösung:
Ändere zu:

yaml
data_dir: C:/Users/Name/Documents/FineTuningOCR/trainData/
label_file_list: ["C:/Users/Name/Documents/FineTuningOCR/trainData/train.txt"]
🛠️ Exportieren des Modells
Beim Export hast du angegeben:

-o Global.pretrained_model=...pdparams

Achtung: Das ist beim Exportieren evtl. nicht korrekt, weil du eigentlich den Pfad zum best_model verwenden solltest, also den Pfad zu deinem trainierten Modell, z. B.:

Global.checkpoints=C:/Users/Name/Documents/FineTuningOCR/result/model/latest

oder besser:
-o Global.checkpoints=...

→ Lösung:
Beim Export muss auf das trainierte Modell verwiesen werden, nicht nochmal auf das Pretrained-Modell. Nur so wird dein FineTuned Modell als Inference Variante exportiert.
🧠 Allgemeines zur Feinabstimmung mit wenig Daten
PaddleOCR ist sehr sensitiv gegenüber:
- Architektur-Mismatch
- Inkonsistentem Dictionary
- Verwendung falscher Parameter beim Exportieren
Deshalb ist bei kleinen Datensätzen (wie deinen 1100 Bildern) saubere Vorbereitung extrem wichtig. Wenn du alle oben genannten Punkte anpasst, solltest du zumindest ein besseres Ergebnis als "gar keine Erkennung" erhalten.

🔗 Relevante Diskussion:

Die Diskussion unter #13897 bestätigt ebenso:

dass falsch initialisierte pretrained weights (z. B. durch Architekturmismatch) dazu führen, dass sich dein Modell wie ein "Random init" Modell verhält.
dort wird auch aufgezeigt, dass Accuracy plötzlich auf 0 springt und bekannte Wörter nicht mehr erkannt werden – exakt wie bei dir.

📌 Zusammenfassung & Empfehlungen:

✅ Verwende dieselbe Architektur wie das Pretrained Modell (SVTR_LCNet, nicht CRNN)

✅ Entferne rec_char_dict_path beim PaddleOCR-Aufruf, wenn du inference.yml nutzt

✅ Nutze beim Exportieren die korrekten checkpoints → auf dein trainiertes Modell, nicht das Pretrained

✅ Achte darauf, dass dein Dictionary exakt konsistent bleibt

✅ Passe Eval data_dir an den korrekten Pfad an

Wenn du diese Punkte umsetzt, solltest du bei einem Retrain mit deinen 1100 Bildern erste Verbesserungen sehen können. Mit mehr Daten (>=5000) wird das Ergebnis natürlich stabiler.

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

1 reply

LeonGitafk Mar 31, 2025
Author

can I expect better results than only using the pretrained model without the finetuning? Would make sense to me because that would be the reason why I finetune it on a specific kind of image type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Paddleocr Finetuning pretrained model with custom data doesnt work as expected #14965

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Paddleocr Finetuning pretrained model with custom data doesnt work as expected #14965

Uh oh!

LeonGitafk Mar 31, 2025

Replies: 1 comment · 1 reply

Uh oh!

GreatV Mar 31, 2025 Maintainer

Uh oh!

LeonGitafk Mar 31, 2025 Author

LeonGitafk
Mar 31, 2025

Replies: 1 comment 1 reply

GreatV
Mar 31, 2025
Maintainer

LeonGitafk Mar 31, 2025
Author