Recommended Train input image size

hello.
I am training and fine-tuning the models below.
- detection: en_PP-OCRv3_det_distill_train(det_mv3_db.yml)
- recognition: custom_v3_en_mobile(en_PP-OCRv3_rec.yml)
- classification: ch_ppocr_mobile_v2.0_cls_train(cls_mv3.yml)

It may be necessary to adjust the size of the image for better performance.
I would like to know about the training input image size for detection, recognition, and classification.
From what I researched, it seems that the input image size in the yml file is as follows.

**Detection
Q1. What is the function of Train's 'EastRandomCropData' in detection? Since the maximum image size is 640,640, will anything exceeding that size be cropped? So what happens when it is less than 640,640?

Q2. What does 'DetResizeForTest' mean in Eval? Is there a reason it is different from the training image size?**

![image](https://github.com/PaddlePaddle/PaddleOCR/assets/81543600/3c54aba6-36c0-4bc2-9472-f86b6640c1e1)
![image](https://github.com/PaddlePaddle/PaddleOCR/assets/81543600/000cb03c-fb1a-40e3-bb5c-b7103828dd69)



**- Recognition
Q3. What does each mean in image_shape? 3: Channel, 48: Height, 320: Width. Is that right?**

![image](https://github.com/PaddlePaddle/PaddleOCR/assets/81543600/2800f466-1db8-4cd6-9422-d6fb921fe893)



**-Classification
Q4 : Channel, 48: Height, 192: Width. Is that right?**

![image](https://github.com/PaddlePaddle/PaddleOCR/assets/81543600/29d05fd4-b292-4c6c-8b45-f5036c827a62)


**Q5. Is it possible to change the size of each input image? I want to train words, not sentences.**

I would appreciate your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended Train input image size #12149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommended Train input image size #12149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions