Add MGPO to Flux network by rockerBOO · Pull Request #2180 · kohya-ss/sd-scripts

rockerBOO · 2025-08-19T06:59:18Z

LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization

It is a update to GGPO, seems to replace it as they updated their paper.

performance	dynamics

network_args = [
   "mgpo_rho=0.05", # (ρ): Perturbation radius
   "mgpo_beta=0.9" #  (β): EMA smoothing factor for gradient magnitude normalization
]

--network_args "mgpo_rho=0.05" "mgpo_beta=0.9"

May need to play with these values but as a starting point. Might need to lower rho to 0.01 and beta to 0.8. They suggest it for "larger models" to move in that direction.

For NLU tasks, we
fine-tune T5-base (Raffel et al., 2020) with a learn-
ing rate of 1×10−4, a sequence length of 128, and a
batch size of 32. ρ = 0.05, μ = 0.9, β = 0.9. For
the NLG tasks, we fine-tune LLaMA-2-7B (Tou-
vron et al., 2023) with a learning rate of 2 × 10−5,
a sequence length of 1024, and a macro batch size
of 32. ρ = 0.01, μ = 0.8, β = 0.8.

also finding this is about 10-12% faster than GGPO (and almost the same speed as regular LoRA, though maybe indisernable)

rockerBOO · 2025-08-20T00:50:50Z

Same settings except

MGPO:

rho: 0.05
beta: 0.9

GGPO:

beta: 0.01
sigma: 0.03

Trained on Flux dev2pro. Inference on Flux Krea.

Krea	MGPO	GGPO	LoRA

rockerBOO · 2025-08-22T00:03:38Z

Currently a couple smaller issues

_grad_magnitude_ema_down and _grad_magnitude_ema_up are getting saved to the file. Need to make sure that this is properly being ignored or move away from it being a Parameters
they hinted at lower gradients causing issues earlier in training. So we can add this to the GGPO representation to make it more stable. In my testing it seems to work better. Since this will change previous, I have made it opt-in through a new ggpo variable (ggpo_min_grad). With this I also refactored out the implementation into a separate function for GGPO to make it cleaner in the forward.

…LoRA

Add MGPO to Flux network

3f47806

rockerBOO marked this pull request as draft August 21, 2025 23:58

Fix _grad_magnitude_ema_up _grad_magnitude_ema_down getting saved to …

15136ca

…LoRA

rockerBOO force-pushed the mgpo branch from 1084c6f to 15136ca Compare November 7, 2025 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MGPO to Flux network#2180

Add MGPO to Flux network#2180
rockerBOO wants to merge 2 commits intokohya-ss:sd3from
rockerBOO:mgpo

rockerBOO commented Aug 19, 2025 •

edited

Loading

Uh oh!

rockerBOO commented Aug 20, 2025

Uh oh!

rockerBOO commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rockerBOO commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rockerBOO commented Aug 20, 2025

Uh oh!

rockerBOO commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rockerBOO commented Aug 19, 2025 •

edited

Loading