[PPOCR rec] Image size of Fineturning & Inference for custom dataset #14835

PhuongNDVN · 2025-03-11T02:34:03Z

PhuongNDVN
Mar 11, 2025

I am trying to fine-tune PPOCR recognition. If I fix the height of the images used for training at 48, the width of the images ranges from 376 to 460 while maintaining the original aspect ratio. What should I set as the image size for PPOCR recognition during training?

My second question is: should I resize the cropped text-region images while keeping the same aspect ratio before passing them to OCR for inference?

The model I fine-tuned predicts well on my custom data, but the minimum prediction score for individual characters in a word is low.

GreatV · 2025-03-11T02:42:03Z

GreatV
Mar 11, 2025
Maintainer

For your questions regarding image size during fine-tuning and inference with PPOCR recognition:

1. What should I set as the image size for PPOCR recognition during training?

Since you fixed the height of images at 48 pixels, and the width varies from 376 to 460 pixels, you should resize the images to a uniform width while keeping the aspect ratio as close as possible.
From the referenced discussion, the training configuration uses:
- RecConAug with image_shape: [48, 320, 3]
- RecResizeImg with image_shape: [3, 48, 320]
Recommended approach: Consider rescaling your images to a standard width used in PPOCR training, such as 320 pixels, while maintaining the height at 48 pixels. This ensures proper feature extraction and consistency during training.

2. Should I resize cropped text-region images while keeping the aspect ratio before passing them to OCR for inference?

Yes, but you should ensure that the model receives inputs of a consistent shape. PPOCR models expect input images of a fixed height (e.g., 48 pixels), while the width can vary.
Suggested approach:
- Resize the cropped image height to 48 pixels (to match training conditions).
- Keep the width flexible but pad or resize it to a fixed size such as 320 pixels (as done in training) to avoid distortion.
- Resizing while maintaining the aspect ratio ensures better recognition accuracy while preventing unwanted artifacts.

3. Low Minimum Prediction Score for Individual Characters

Your model performs well, but low prediction scores for individual characters could be due to:
- Incorrect fine-tuning weights: Ensure that pre-trained weights are correctly loaded.
- Character dictionary issues: If you modified the character set, ensure that the dictionary used matches the training dataset.
- Text augmentation: Excessive distortions or inconsistent image preprocessing could make it harder for the model to recognize characters accurately.

To diagnose this issue further, consider visualizing low-confidence predictions and fine-tuning thresholds for confidence scores.

For more details, refer to this related discussion on fine-tuning PPOCRv4 recognition:
Discussion: Fine-tuning PPOCRv4 recognition

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PPOCR rec] Image size of Fineturning & Inference for custom dataset #14835

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[PPOCR rec] Image size of Fineturning & Inference for custom dataset #14835

Uh oh!

PhuongNDVN Mar 11, 2025

Replies: 1 comment

Uh oh!

GreatV Mar 11, 2025 Maintainer

1. What should I set as the image size for PPOCR recognition during training?

2. Should I resize cropped text-region images while keeping the aspect ratio before passing them to OCR for inference?

3. Low Minimum Prediction Score for Individual Characters

PhuongNDVN
Mar 11, 2025

GreatV
Mar 11, 2025
Maintainer