Fine Tuning Detection on Custom Dataset #14602
Replies: 1 comment 1 reply
-
To address your questions and issues with fine-tuning the 1. Image Size and PreprocessingWhat is the smallest image size that can be input to the model?The PaddleOCR detection models generally don't have a strict minimum image size, as they can dynamically resize images during preprocessing. However, very small images can lead to issues due to insufficient feature extraction. The smallest size should ensure that the text regions remain distinguishable after resizing and cropping. Relevant Configurations:
Recommendation:
What size does the model expect?The model expects the input size to be a multiple of 32 due to the output stride of the backbone. For example:
2. Training Logs and MetricsHow to interpret the logs?The logs provide detailed metrics for each training step. Here's what the key terms mean:
Why is
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to fine tune the 'en_PP-OCRv3_det_distill_train' to my custom dataset, which consists on small images.
I've prepared the dataset according to the format presented in https://paddlepaddle.github.io/PaddleOCR/latest/en/datasets/ocr_datasets.html:
And I'm using the config file in https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/model_list.html with some modifications:
I have a couple of questions that I haven't been able to answer by digging in the docs:
Regarding the size of the images and the preprocessing steps in the config file, there are a couple of options (EastRandomCropData and ShrinkMap) that seem to affect the size of the image that is going to be inputed to the model. This could affect performance, as I it can mess with the images:
When fine-tuning I get the following log:
hmean
not being calculated? Settingcal_metric_during_train
makes the train process to fail on start. Settingeval_batch_step
to a number that is actually reached by the learning process (i.e. 40) gives me:There are some images in my dataset that do not contain any text. I want the detection to also learn to not output anything if there's no text in the image.
After fine-tuning another model, even with small learning rate, performance seems to worsen, which makes me believe there is something I'm doing wrong. Is there any tutorial which tries to do something like this (just fine tune detection with a set of custom images), which goes in a little bit more depth that the ones in https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html
Beta Was this translation helpful? Give feedback.
All reactions