feat(scheduler): Add scale_betas_for_timesteps to DDPMScheduler #12341
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    

What does this PR do?
This PR introduces a new boolean flag,
scale_betas_for_timesteps, to theDDPMScheduler. This flag provides an optional, more robust way to handle the beta schedule whennum_train_timestepsis set to a value other than the default of 1000.Motivation and Context
The default parameters for the
DDPMScheduler(beta_start=0.0001,beta_end=0.02) are implicitly tuned fornum_train_timesteps=1000. This creates a potential "usability trap" for practitioners who may change the number of training timesteps without realizing they should also adjust the beta range.num_train_timestepsto a large value (e.g., 4000), the linear beta schedule becomes too shallow, and noise is added too slowly.num_train_timestepsis set to a small value (e.g., 200), the schedule becomes too steep, and noise is added too aggressively.Both scenarios can lead to suboptimal training performance that is difficult to debug.
Proposed Solution
This PR introduces an opt-in solution to this problem.
scale_betas_for_timesteps, is added to the scheduler's__init__method.Falseto ensure 100% backward compatibility with existing code.True, it automatically scales thebeta_endparameter using a simple heuristic (beta_end * (1000 / num_train_timesteps)). This ensures that the overall noise schedule remains sensible and robust, regardless of the number of training steps chosen by the user.beta_endis used by schedules dependent on it (e.g.,linear,scaled_linear), while schedules that do not use this parameter (e.g.,squaredcos_cap_v2) are naturally unaffected.This change makes the scheduler more intuitive and helps prevent common configuration errors.
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
As suggested by the contribution guide for schedulers: @yiyixuxu