You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/package_reference/delora.md
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,9 +18,14 @@ rendered properly in your Markdown viewer.
18
18
[DeLoRA](https://huggingface.co/papers/2503.18225) is a parameter-efficient fine-tuning technique that implicitly maintains a Frobenius boundary with respect to the pretrained weights by normalizing and scaling learnable low-rank matrices. This effectively decouples the learning of directions (BA term) and magnitude (boundary term) of the weight updates, avoiding catastrophic shifts in the adapted weights and enhancing robustness to hyperparameter choices.
19
19
20
20
Note:
21
-
- use 10-100x larger learning rate than standard LoRA variants (typical values from 1e-3/1e-2/..)
22
-
- do not set a too small initial boundary parameter lambda (typical values are around 10/15/..)
23
-
- setting different lambdas to different layers is possible
21
+
- use a learning rate 10-100x larger than for standard LoRA variants (typical values from 1e-3/1e-2/..)
22
+
- ensure the initial boundary parameter lambda is not too small (typical values around 10/15/..). Setting different lambdas to different layers is possible
23
+
24
+
DeLoRA currently has the following constraints:
25
+
- Only nn.Linear layers are supported.
26
+
- Quantized layers are not supported.
27
+
28
+
If these constraints don't work for your use case, consider other methods instead.
0 commit comments