Skip to content

Clarification on Image Cropping, GT_size/LR_size Settings, and Slow Convergence During Inpainting Training #141

@n33lkanth

Description

@n33lkanth

Dear @Algolzw,

I have a question regarding the image size and cropping logic used during training and testing for my inpainting task. I’ve uploaded a sample image with the actual dimensions for your reference. The sample image itself does not contain any data — it’s only intended to illustrate the cropping logic.

My original image resolution is 1920×1080 (shown in purple). The masked region (white rectangle) can appear anywhere within the green region of the image. Therefore, I cropped all my training and test images to this green region, resulting in 648×648 images, instead of using random cropping.

In my train.yml configuration file, I have set the following parameters:

GT_size: 648
LR_size: 648

I did this because, as per my understanding, setting smaller values such as 128 would crop the image down to 128×128, which could easily exclude the masked region. Could you please confirm if this setup is appropriate for my case?

I am using these 648×648 cropped images to train the IR-SDE model for 700,000 iterations. So far, after around 320,000 iterations, the model seems to be converging very slowly, with significant fluctuations observed in both the training loss and PSNR curves. I have attached a screenshot of the training progress for your reference.

For your convenience, below is my current train.yml configuration:


#### general settings
name: ir-sde-custom 
use_tb_logger: true
model: denoising
distortion: inpainting
gpu_ids: [0]

sde:
  max_sigma: 30
  T: 100
  schedule: cosine # linear, cosine
  eps: 0.005

degradation: # for some synthetic dataset that only have GTs
  # for denoising
  sigma: 25
  noise_type: G # Gaussian noise: G

  # for super-resolution
  scale: 4

  # for inpainting
  mask_root: ~

#### datasets
datasets:
  train:
    name: Train_Dataset
    mode: LQGT
    dataroot_GT: /path to train dataset/train/uncompleted/GT
    dataroot_LQ: /path to train dataset/train/uncompleted/LQ
    use_shuffle: true
    n_workers: 8  # per GPU
    batch_size: 2 
    GT_size: 648
    LR_size: 648
    use_flip: true
    use_rot: false
    color: RGB

  val:
    name: Val_Dataset
    mode: LQGT
    dataroot_GT: /path to val dataset/val/uncompleted/GT
    dataroot_LQ: /path to val dataset/val/uncompleted/LQ

#### network structures
network_G:
  which_model_G: ConditionalUNet
  setting:
    in_nc: 3
    out_nc: 3
    nf: 64
    depth: 4

#### path
path:
  pretrain_model_G: ~
  strict_load: true
  resume_state: ~

#### training settings
train:
  optimizer: Adam
  lr_G: !!float 1e-4
  lr_scheme: MultiStepLR
  beta1: 0.9
  beta2: 0.99
  niter: 700000
  warmup_iter: -1
  lr_steps: [200000, 400000, 600000]
  lr_gamma: 0.5
  eta_min: !!float 1e-7
  is_weighted: False
  loss_type: l1
  weight: 1.0
  manual_seed: 40
  val_freq: !!float 1e4

#### logger
logger:
  print_freq: 100
  save_checkpoint_freq: !!float 5e3

I would really appreciate your feedback on whether my understanding of the cropping and GT_size/LR_size parameters is correct, and if the slow convergence behavior could be related to this setup.

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions