Recommended Train input image size #12764

agzeroo · 2024-05-21T05:50:33Z

agzeroo
May 21, 2024

hello.
I am training and fine-tuning the models below.

detection: en_PP-OCRv3_det_distill_train(det_mv3_db.yml)
recognition: custom_v3_en_mobile(en_PP-OCRv3_rec.yml)
classification: ch_ppocr_mobile_v2.0_cls_train(cls_mv3.yml)

It may be necessary to adjust the size of the image for better performance.
I would like to know about the training input image size for detection, recognition, and classification.
From what I researched, it seems that the input image size in the yml file is as follows.

**Detection
Q1. What is the function of Train's 'EastRandomCropData' in detection? Since the maximum image size is 640,640, will anything exceeding that size be cropped? So what happens when it is less than 640,640?

Q2. What does 'DetResizeForTest' mean in Eval? Is there a reason it is different from the training image size?**

- Recognition
Q3. What does each mean in image_shape? 3: Channel, 48: Height, 320: Width. Is that right?

-Classification
Q4 : Channel, 48: Height, 192: Width. Is that right?

Q5. Is it possible to change the size of each input image? I want to train words, not sentences.

I would appreciate your reply.

GreatV · 2024-05-21T10:19:52Z

GreatV
May 21, 2024
Maintainer

Q1. Function of `EastRandomCropData`

The EastRandomCropData class is designed to perform random cropping on the input image while maintaining important text regions.

The function attempts to find a crop area that includes text regions (text_polys) within the specified number of tries (max_tries).
It calculates the scaling factors scale_w and scale_h based on the target size and the crop size.
If keep_ratio is True, the cropped area is resized to fit within the target size, maintaining the aspect ratio, and padded with zeros if necessary.
If keep_ratio is False, the cropped area is resized directly to the target size.

Q2. `DetResizeForTest` in Evaluation

DetResizeForTest is used during evaluation to resize the input images to a fixed size before passing them to the detection model. This ensures consistent input dimensions and allows for a standardized evaluation process.

Reasons for Different Sizes:

During training, various augmentations (like cropping) help the model learn to generalize better.
During evaluation, a fixed size provides a consistent basis for measuring performance, ensuring that the evaluation metrics are not affected by varying input sizes.

Recognition and Classification

Q3. Recognition Image Shape

Yes, image_shape: [3, 48, 320] means:

3: Channels (RGB).
48: Height.
320: Width.

Q4. Classification Image Shape

Yes, image_shape: [3, 48, 192] means:

3: Channels (RGB).
48: Height.
192: Width.

Q5. Changing Input Image Size

Please refer: https://github.com/search?q=repo%3APaddlePaddle%2FPaddleOCR+word+level+&type=issues

1 reply

Godnoken Jun 7, 2025

Hi,

Can we get a confirmation that the models are trained on BGR and not RGB as you mention here?

agzeroo · 2024-05-22T08:41:00Z

agzeroo
May 22, 2024
Author

@GreatV
Thank you for quick response.
But I still have questions.

Q1. EastRandomCropData seems to be a means for augmentation. I am curious about the exact input image size of the detection model. Is there a way to check?
There will be an input image size for the model, and it will be processed and trained according to the input image size. I would like to know how it is handled.

Q2. DetResizeForTest How does an image change when it becomes larger or smaller than a fixed size? If the image is smaller than a fixed size, is it supposed to fill in the space?

Q3. I am curious about how images are processed in the Recognition and Classification model. How is it handled and trained if it is smaller or larger than the fixed size?

0 replies

GreatV · 2024-05-22T08:45:28Z

GreatV
May 22, 2024
Maintainer

hi @agzeroo

For code details, please refer to:

0 replies

agzeroo · 2024-06-04T04:19:35Z

agzeroo
Jun 4, 2024
Author

@GreatV

hello.
Thank you for answer.

In DetResizeForTest of detection Eval, the image size is set to [736, 1280].
How did the number 736 come about?
If I change it, will there be any specific restrictions?
For example, would it be okay to change 736 to 640 or 960?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommended Train input image size #12764

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recommended Train input image size #12764

Uh oh!

agzeroo May 21, 2024

Replies: 4 comments · 1 reply

Uh oh!

GreatV May 21, 2024 Maintainer

Q1. Function of EastRandomCropData

Q2. DetResizeForTest in Evaluation

Recognition and Classification

Q5. Changing Input Image Size

Uh oh!

Godnoken Jun 7, 2025

Uh oh!

agzeroo May 22, 2024 Author

Uh oh!

GreatV May 22, 2024 Maintainer

Uh oh!

agzeroo Jun 4, 2024 Author

agzeroo
May 21, 2024

Replies: 4 comments 1 reply

GreatV
May 21, 2024
Maintainer

Q1. Function of `EastRandomCropData`

Q2. `DetResizeForTest` in Evaluation

agzeroo
May 22, 2024
Author

GreatV
May 22, 2024
Maintainer

agzeroo
Jun 4, 2024
Author