Seed expanded LoRA dimensions with noise instead of zeros by Stefatorus · Pull Request #724 · ostris/ai-toolkit

Stefatorus · 2026-02-23T02:24:36Z

Summary

When expanding a pretrained LoRA to a higher rank (e.g. 16 → 64), new dimensions on both lora_down and lora_up were zero-initialized with torch.zeros(). This creates a gradient dead zone: dL/dB[:,i] depends on A[i,:] being nonzero, and dL/dA[i,:] depends on B[:,i] being nonzero. When both sides are zero, neither can ever receive a gradient — the new dimensions are permanently dead.
This fix seeds new dimensions with small noise (kaiming-style for lora_down, small random for lora_up) scaled relative to the existing learned weights. The perturbation to the original ΔW is <1%, but all new dimensions now have nonzero gradients from step 1.

Problem

Expanding a rank-16 LoRA to rank-64 with Prodigy optimizer, we observed via SVD analysis that effective rank @ 95% energy stayed flat at ~10.5 across hundreds of training steps — none of the 48 new dimensions were ever utilized. The root cause is the zero-initialization creating a fixed point in the loss landscape that gradient descent cannot escape.

Solution

Replace torch.zeros() padding with small noise:

lora_down (A): Kaiming-style noise with std = (weight_scale * 0.1) / sqrt(fan_in)
lora_up (B): Small random noise at 1% of typical column magnitude

This matches the asymmetry of standard LoRA initialization (larger A, smaller B) while keeping the perturbation small enough to preserve the pretrained signal.

Test plan

Verified original dimensions are preserved exactly (max diff = 0)
Verified new dimensions are nonzero on both lora_down and lora_up
Verified gradients would flow through all new dimensions
Verified perturbation to ΔW is small (~5% with synthetic weights, lower with real trained weights)
Integration test: expand a trained LoRA and verify effective rank increases during continued training

🤖 Generated with Claude Code

I was working on training a LoRA on Flux 2 Dev, initialized with Rank 16, and wanted to expand the rank later in order to see if i can escape a plateau.

Ai-Toolkit does allow for graceful expansion of LoRA rank, unfortunately, though it seeds the new dimensions with 0 (which blocks the gradient from passing through).

I've solved this on my end by manually seeding the converted LoRA with noise as the new dimensions were unused, and have asked Claude Code to commit the fix.

When expanding a pretrained LoRA to a higher rank, the new dimensions were zero-initialized on both lora_down (A) and lora_up (B). This creates a dead zone where neither side can receive gradients: dL/dB[:,i] depends on A[i,:] (zero) and dL/dA[i,:] depends on B[:,i] (zero), so the new dimensions can never learn. Seed new dimensions with small noise scaled relative to existing weights: kaiming-style for lora_down, small random for lora_up. This preserves the original ΔW (perturbation <1%) while ensuring gradients flow through all new dimensions from step 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seed expanded LoRA dimensions with noise instead of zeros#724

Seed expanded LoRA dimensions with noise instead of zeros#724
Stefatorus wants to merge 1 commit intoostris:mainfrom
Stefatorus:fix/lora-expansion-noise-seeding

Stefatorus commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Stefatorus commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Stefatorus commented Feb 23, 2026 •

edited

Loading