-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Dear @Algolzw,
I have a question regarding the image size and cropping logic used during training and testing for my inpainting task. I’ve uploaded a sample image with the actual dimensions for your reference. The sample image itself does not contain any data — it’s only intended to illustrate the cropping logic.
My original image resolution is 1920×1080 (shown in purple). The masked region (white rectangle) can appear anywhere within the green region of the image. Therefore, I cropped all my training and test images to this green region, resulting in 648×648 images, instead of using random cropping.
In my train.yml configuration file, I have set the following parameters:
GT_size: 648
LR_size: 648
I did this because, as per my understanding, setting smaller values such as 128 would crop the image down to 128×128, which could easily exclude the masked region. Could you please confirm if this setup is appropriate for my case?
I am using these 648×648 cropped images to train the IR-SDE model for 700,000 iterations. So far, after around 320,000 iterations, the model seems to be converging very slowly, with significant fluctuations observed in both the training loss and PSNR curves. I have attached a screenshot of the training progress for your reference.
For your convenience, below is my current train.yml configuration:
#### general settings
name: ir-sde-custom
use_tb_logger: true
model: denoising
distortion: inpainting
gpu_ids: [0]
sde:
max_sigma: 30
T: 100
schedule: cosine # linear, cosine
eps: 0.005
degradation: # for some synthetic dataset that only have GTs
# for denoising
sigma: 25
noise_type: G # Gaussian noise: G
# for super-resolution
scale: 4
# for inpainting
mask_root: ~
#### datasets
datasets:
train:
name: Train_Dataset
mode: LQGT
dataroot_GT: /path to train dataset/train/uncompleted/GT
dataroot_LQ: /path to train dataset/train/uncompleted/LQ
use_shuffle: true
n_workers: 8 # per GPU
batch_size: 2
GT_size: 648
LR_size: 648
use_flip: true
use_rot: false
color: RGB
val:
name: Val_Dataset
mode: LQGT
dataroot_GT: /path to val dataset/val/uncompleted/GT
dataroot_LQ: /path to val dataset/val/uncompleted/LQ
#### network structures
network_G:
which_model_G: ConditionalUNet
setting:
in_nc: 3
out_nc: 3
nf: 64
depth: 4
#### path
path:
pretrain_model_G: ~
strict_load: true
resume_state: ~
#### training settings
train:
optimizer: Adam
lr_G: !!float 1e-4
lr_scheme: MultiStepLR
beta1: 0.9
beta2: 0.99
niter: 700000
warmup_iter: -1
lr_steps: [200000, 400000, 600000]
lr_gamma: 0.5
eta_min: !!float 1e-7
is_weighted: False
loss_type: l1
weight: 1.0
manual_seed: 40
val_freq: !!float 1e4
#### logger
logger:
print_freq: 100
save_checkpoint_freq: !!float 5e3
I would really appreciate your feedback on whether my understanding of the cropping and GT_size/LR_size parameters is correct, and if the slow convergence behavior could be related to this setup.
