Skip to content

Doing full validation on step 0ย #20985

@sahandrez

Description

@sahandrez

Description & Motivation

Hi,

I've been trying to do full validation on step 0 but everything I've tried has failed in some way. I am aware of this stale issue but I could not re-open it so I created this one. Running full validation on step 0 is very useful for the cases where we want to finetune an already well-perfoming model.

These are the things I've tried:

  • Using trainer.validate() Add Trainer.validate(โ€ฆ) method to run one validation epochย #4948 fails with DDP because if you manually invoke the validate method, strange things happen with the dataloaders and the DDP checks fail afterwards.
  • Setting trainer.num_sanity_val_steps to -1 so it runs the sanity check on the full validation dataset also fails. Because during the sanity checking the loggers are not properly set up. I tried various versions where I attempted to manually set the loggers before the sanity checking and even forcing them to log, but those also failed and became very unnecessarily hacky.
  • Tried temporarily setting trainer.val_check_interval to 1 to force the validation to happen at step 1 at least, but then setting it back to its original value did not take any effect and the trianer kept validating at every step.

I feel like this should be easier to do and maybe I'm missing something.

Thanks in advance.

Pitch

Running validation at step 0 is important for many finetuning pipelines, and I think it should be easier to run it robustly on DDP without having to hack many things around the trainer pipeline.

Alternatives

No response

Additional context

No response

cc @lantiga @Borda @justusschock

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions