Replies: 1 comment
-
To train ResNet50 effectively on large images like 2048x2048 using the res_r50_vd_db.yml config (DB algorithm for text detection), you’ll need to make several crucial adjustments. The default configuration is optimized for smaller images (like 640x640), which explains why your performance is not optimal for high-res inputs. Here are the key areas you should revise:
a. Crop Size in EastRandomCropData → Change to better reflect your input: Reason: Larger crop sizes preserve more context and detail, especially for 2048x2048 images. Just make sure it fits in your GPU memory. b. Resize Augmentation → You might want to constrain this to a more stable range like [1.0, 2.0] especially if the images are already large, to avoid distortion.
Currently in DetResizeForTest: → For large images, change this to something like: or, Just ensure it reflects your input aspect ratio and fits in memory.
a. Lower batch_size_per_card → Reduce batch size to something like: b. Adjust learning rate accordingly:
The Resize in IaaAugment along with random cropping might remove important details. You can try disabling random cropping temporarily or use random cropping with larger areas retained. Alternatively, you may consider other spatial augmentations more suitable for large images like RandomScale, RandomRotate, etc.
Make sure the test resolution approximates the input resolution. Otherwise, predicted results might be blurry or mismatched. → DetResizeForTest: Note: You might want to try turning off resizing altogether if the model supports arbitrary size inputs.
Set: Optional:
For evaluation/inference on 2048x2048 images with limited memory, you may run patch-wise inference and aggregate results. Summary of Key Changes:
Implement the above changes incrementally while monitoring hmean and loss values during training and evaluation phases. Adjust further based on GPU memory constraints and overfitting observations. Let me know if you'd like help with modifying the full .yml file accordingly. Response generated by 🤖 feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, Everyone
I hope you all are doing well. Actually I have been used the Res_r50_vd_db.yml configuration for training the resnet50 model but doesn't giving me the good results because I have the images of 2048x2048 while I don't have complete information of how to setup the model configuration for these larger images to get the highest accuracy as possible.
So could you suggest some key parameters changing that let me achieve the highest accuracy for the larger images training of 2048x2048.
Thanks You.
Beta Was this translation helpful? Give feedback.
All reactions