Orthogonal subspace learning #2

NikhilNayak-debug · 2025-07-15T16:07:43Z

Summary

add svd_utils implementing SVD-based orthogonal subspace learning utilities
expose new utilities in public API
include a simple round‑trip test for SVD utilities

Testing

make quality
make style

RobotSail · 2025-07-24T20:23:04Z

src/peft/utils/svd_utils.py

+    return optimizer
+
+
+def wrap_model_with_svd(model: nn.Module, svd_config: dict[str, int] | None = None) -> nn.Module:


Should this be renamed to wrap_model_with_osf as well?

RobotSail · 2025-07-24T20:23:43Z

src/peft/utils/svd_utils.py

+    return config
+
+
+def create_svd_model_class(base_cls: type) -> type:


I think this one may need to be updated as well

RobotSail · 2025-07-24T20:25:19Z

src/peft/utils/__init__.py

    "set_peft_model_state_dict",
    "shift_tokens_right",
    "transpose",
+    "wrap_model_with_svd",


You may need to update all instances of SVD across the PR

RobotSail · 2025-07-24T20:25:44Z

tests/test_svd_utils.py

@@ -0,0 +1,39 @@
+import torch


This file seems like it may also need to be updated.

RobotSail · 2025-07-24T20:35:52Z

src/peft/tuners/osf/model.py

+import torch.nn as nn
+
+from peft.tuners.tuners_utils import BaseTuner
+from peft.utils.osf_utils import (


I think you updated the module name, but the file itself is still svd_utils.

RobotSail · 2025-07-24T20:37:52Z

src/peft/utils/svd_utils.py

+            dV.copy_(local_dV)
+
+
+def auto_generate_target_svd_config(model: nn.Module) -> dict[str, int]:


It seems like you might have updated this name to auto_generate_target_osf_config but the change didn't make it into your PR.

RobotSail · 2025-07-24T20:42:01Z

src/peft/tuners/osf/__init__.py

+__all__ = ["OSFConfig", "OSFModel"]
+
+register_peft_method(
+    name="osf",


You might also need to register your method as a new PEFT type in peft.utils.peft_types.PeftType, otherwise this won't work

RobotSail · 2025-07-24T21:01:54Z

src/peft/utils/svd_utils.py

+
+def auto_generate_target_svd_config(model: nn.Module) -> dict[str, int]:
+    """Create a mapping from parameter names to ``top_k`` based on layer size."""
+    target_patterns = [


We will need to refactor this out similar to how we did it in the original PR: https://github.com/Red-Hat-AI-Innovation-Team/mini_trainer/pull/1/files#diff-09721c27e1a636c47222f5c7994cccbad3067007fe4c454e43a04e9bd3bd8b67R504

Makes it easier to track rate limiting issues.

- The warning message was missing spaces between sentences. - Added ' around strings for clarity - For one warning, which extended another warning, put it at the start instead of the end, because the other warning can be quite long, leading to users missing the addition For more context on this warning, see huggingface#2254

- default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.

…ce#2763) Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same time, as the API is not quite intuitive and there are some footguns around trainable parameters. This question has come up multiple times in the past (for recent examples, check huggingface#2749 and huggingface#2756). Thus it's a good idea to properly document this. --------- Co-authored-by: Steven Liu <[email protected]>

Resolves huggingface#2783. Most PEFT layers (BaseTunerLayers) expose the in_features and out_features attributes. Therefore, other packages like diffusers may expect this attribute to exist. However, there were a few PEFT methods where these attributes were missing: - LoHa - LoKr - LN Tuning - Trainable Tokens The layers of these methods now also expose the attributes. Implementation To avoid code duplication, I factored out the whole code block in LoRA layers that extracts these attributes, since LoRA has the most exhaustive list of checks. The new utility function has the exact same functionality and can now be used by other PEFT methods. I updated the four PEFT methods mentioned above to use this new function, but I did not update PEFT methods that already handled it, as there wasn't really a need (they check one or two layer types at most, so there is little duplication).

Right now, get_model_status() and get_layer_status() only report on BaseTunerLayers, but it would be helpful if they could also report auxiliary modules. This PR now includes those. To facilitate this, a few attributes and methods were added to AuxiliaryTrainingWrapper and subclasses to make them more similar to BaseTunerLayer (e.g. the adapter_layer_names attribute). These attributes and methods were assumed to be present in the code that determines the model and layer status.

Discussed internally

@unknown

This PR adds the PEFT version to the adapter_config.json. This can be useful in the future -- for instance when we change the state dict format of a PEFT method, we can convert it in a backwards compatible way based on the PEFT version being used. It can also be useful for debugging by providing an easy way to see the PEFT version that was used to train a PEFT adapter. Notes: In huggingface#2038, we made a change to PEFT configs to make it so that even if new arguments are added to a config, it can still be loaded with older PEFT versions (forward compatibility). Before that change, adding the PEFT version would have been quite disruptive, as it would make all PEFT configs incompatible with older PEFT versions. Said PR was included in the 0.14.0 release from Dec 2024, so we can expect the vast majority of PEFT users to use this version or a more recent one. If the PEFT version is a dev version, the version tag is ambiguous. Therefore, I added some code to try to determine the commit hash. This works if users installed PEFT with git+...@<HASH>. Unit testing that the function to determine the hash works with these types of installs is not trivial. Therefore, I just patched the function to return a fixed hash. I did, however, test it locally and it works: python -m pip install git+https://github.com/huggingface/diffusers.git@5e181eddfe7e44c1444a2511b0d8e21d177850a0 python -c "from peft.config import _get_commit_hash; print(_get_commit_hash('diffusers'))" Also note that I tried to make the retrieval of the hash super robust by adding a broad try ... except. If there is an error there, e.g. due to a busted install path, we never want this to fail, but rather just accept that the hash cannot be determined (we add @unknown in this case). If users installed a dev version of PEFT in different way, e.g. using git clone && pip install ., the commit hash will not be detected. I think this is fine, I really don't want to start shelling out with git just for this purpose.

Resolves huggingface#2772 Fixes several edge cases with unusual layer names or target modules. 1. As huggingface#2772 stated, if "weight" is part of a layer name, it would be treated incorrectly when creating the PEFT state_dict. 2. Similarly, when the adapter name itself is part of a layer name. Some of these errors would pass silently, which is especially bad (e.g. a weight not being loaded but no error raised). I also added some tests that were not failing before, but to cover some yet uncovered cases or to lay out some basic functionality. While working on this, I also noticed that it was possible to target a BaseTunerLayer with modules_to_save and trainable_token_indices (e.g. the lora_A and lora_B nn.Linear would be replaced with ModulesToSaveWrapper). I don't think this is ever desired, so we now raise an error if this is detected.

Add `<Tip>`s converted to new syntax to docstrings. --------- Co-authored-by: nemo <[email protected]>

The reset_sessions function is removed but it's also no longer necessary to call it for the purpose we used it. Moreover, the deprecated use_auth_token argument is fully removed now, so everywhere we used to pass it, it is now removed, unless a user passes it explicitly. Also, remove the deprecated local_dir_use_symlinks argument.

Implements the paper "Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532). WaveFT enables fine-grained control over the number of trainable parameters by directly learning a sparse set of coefficients in the wavelet domain of residual matrices. Experiments show that it works well in the text-to-image generation space.

)

When using add_weighted_adapter, so far, there was an implicit assumption that all weights are positive. This PR allows negative weights to be passed. --------- Co-authored-by: Valentin Teutschbein <[email protected]>

A seed was accidentally chosen that results in a test failing with XPU. Signed-off-by: jiqing-feng <[email protected]>

While memory usage correlates with the number of trainable params, having this number directly makes it easier to see that methods are using similar numbers of trainable params and outliers can be inspected easily.

Check if PEFT triggers transformers FutureWarning or DeprecationWarning by converting these warnings into failures.

This PR adds the set_requires_grad method to PEFT models (both PeftModel and BaseTuner). As the name suggests, this is a method to set the requires_grad attribute of the specified PEFT adapters. For more general context, this is mostly relevant when dealing with multiple adapters. As is, users can already set the active adapter(s) with set_adapter, which automatically adjust the requires_grad attribute too, so that only the active adapters will have grads enabled. However, there can be situations where activity status and requires grad may differ. Right now, users would need to manually set requires_grad to deal with that, which is error prone (e.g. forgetting modules_to_save). This PR closes this gap in the API. As this functionality is quite general purpose, I added a set_requires_grad function to functional.py for easier integration. Note: The set_requires_grad method will raise an error when called with prompt learning methods like prompt tuning. This is because these methods don't have a universal base class (BaseTuner and BaseTunerLayer) that would allow to add this API. Moreover, they only support a single adapter at a time, hence there is not much need to have this method in the first place. A side effect of not supporting prompt learning is that on the PeftModel, we are free to allow set_requires_grad to accept more than one adapter, which would normally be difficult, because prompt learning only allows one adapter.

A new initialization method was added to prompt tuning in huggingface#2815. This PR adds an experiment config for this method to the MetaMathQA benchmark. Testing locally, this got a test accuracy of 36%, compared to 25% with random initialization.

Resolves huggingface#2809 Some models like Gemma3 apply a scalar to the embedding output. It needs to be taken into account when using trainable tokens or LoRA applied to the embedding layer.

This is to fix an oversight from huggingface#2797, where the LoftQ test was sligthly refactored but one test was not updated accordingly.

Note: Diffusers is left as is for now, might need an update later.

The "LoRA Without Regret" blog post (https://thinkingmachines.ai/blog/lora/) mentions that targeting the MLP part of the transformer is more effective than targeting the attention modules. This experiment tests this by targeting: ["gate_proj", "up_proj", "down_proj"] instead of the default layers (["q_proj", "v_proj"]). I chose a rank to match the parameter count we would get when targeting the attention modules with rank 32, which is rank 10. Testing on my machine, there is indeed a nice improvement in the test score: | metric | target attention | target MLP | |----------------------|------------------|------------| | test accuracy | 48.2% | 51.3% | | # trainable params | 9175040 | 9461760 | | peak memory reserved | 20.74 GB | 23.02 GB | There is, however, also a marked increase in memory usage, despite matching parameter count. Since the operations are different, this may not be a surprise, but let's wait for the final verdict once this experiment runs on our AWS instance. Note: I also tested higher and lower ranks when targeting the MLP. The effect on memory usage was negligible, but it did improve the score: | metric | rank 8 | rank 10 | rank 12 | rank 32 | |--------------------|---------|---------|----------|----------| | test accuracy | 50.3% | 51.3% | 52.2% | 54.8% | | # trainable params | 7569408 | 9461760 | 11354112 | 30277632 | In the end, I chose only to add the rank 10 experiment to match the number of trainable parameters.

Implements DeLoRA: "Decoupling Angles and Strength in Low-rank Adaptation" (https://huggingface.co/papers/2503.18225). Similar to DoRA, DeLoRA decouples the angular learning from the adaptation strength, but it also allows to limit the norm of the change. This way, DeLoRA promises to reduce the risk of catastrophic forgetting and to be more robust to hyper-parameter settings such as the learning rate.

Adds an option to the LoRA config, ensure_weight_tying, which, if enabled, ensures that if the embedding and LM head are tied, they share the ModulesToSaveWrapper. This ensures that their weights work correctly even after merging them.

rebasing to make use of simplified basetuner implementation and adding more experiment results fixing style, quality, etc in the code Make style fixing CI and other test cases

NikhilNayak-debug mentioned this pull request Jul 15, 2025

Feature Request: Add Adaptive Singular Value Decomposition based Orthogonal Subspace Fine-Tuning huggingface/peft#2648

Open

RobotSail reviewed Jul 24, 2025

View reviewed changes

tests/test_svd_utils.py Outdated

@@ -0,0 +1,39 @@

import torch

Copy link

RobotSail Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems like it may also need to be updated.

RobotSail reviewed Jul 24, 2025

View reviewed changes

NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 2d435a5 to 372a375 Compare September 23, 2025 23:02

githubnemo and others added 20 commits September 24, 2025 17:49

Use technical user for CI runs (huggingface#2800)

4f868bd

Makes it easier to track rate limiting issues.

ENH Support XPU in DoRA FT example (huggingface#2700)

c15daaa

Method comparison: Add MiSS result (huggingface#2740)

530d7bb

- default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.

CHORE DOC Migrate tips syntax (huggingface#2801)

190f987

Discussed internally

DOC Some more TIP syntax migration (huggingface#2806)

4469af5

Add `<Tip>`s converted to new syntax to docstrings. --------- Co-authored-by: nemo <[email protected]>

FIX LoftQ 8-bit bnb error, support XPU (huggingface#2797)

ffa971a

CHORE Drop Python 3.9, add 3.13 (huggingface#2790)

815956b

FIX Typo in PiSSA finetune README (huggingface#2812)

f00d94a

FIX DOC Add missing TOC entry for WaveFT (huggingface#2814)

31989ea

ENH Add sample vocab init to PromptEmbedding (huggingface#2815)

2c29cf7

FIX X-LoRA scaling storage and per token normalization (huggingface#2793

e9f5707

)

ENH Merging LoRAs supports negative weights (huggingface#2811)

f8aca0a

When using add_weighted_adapter, so far, there was an implicit assumption that all weights are positive. This PR allows negative weights to be passed. --------- Co-authored-by: Valentin Teutschbein <[email protected]>

jiqing-feng and others added 10 commits October 10, 2025 12:29

FIX bnb weights can be dequantized on CPU (huggingface#2820)

879587f

TST Change bad random seed (huggingface#2829)

2410f45

A seed was accidentally chosen that results in a test failing with XPU. Signed-off-by: jiqing-feng <[email protected]>

Add num_trainable_params column to gradio app (huggingface#2819)

2f9f759

While memory usage correlates with the number of trainable params, having this number directly makes it easier to see that methods are using similar numbers of trainable params and outliers can be inspected easily.

CI Testing transformers deprecations (huggingface#2817)

61a11f9

Check if PEFT triggers transformers FutureWarning or DeprecationWarning by converting these warnings into failures.

FIX Handle embed scale for trainable tokens, LoRA (huggingface#2825)

9b8cf2a

Resolves huggingface#2809 Some models like Gemma3 apply a scalar to the embedding output. It needs to be taken into account when using trainable tokens or LoRA applied to the embedding layer.

FIX X-LoRA embed_scale support huggingface#2830 (huggingface#2831)

ec5a1b2

FIX DoRA embed_scale support (huggingface#2839)

086f187

FIX TST Wrong attribute in LoftQ test (huggingface#2841)

87b90f0

This is to fix an oversight from huggingface#2797, where the LoftQ test was sligthly refactored but one test was not updated accordingly.

NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 372a375 to 00073fe Compare October 15, 2025 15:21

shantanugupta2004 and others added 6 commits October 16, 2025 14:59

CHORE Replace deprecated torch_dtype with dtype (huggingface#2837)

1a1f972

Note: Diffusers is left as is for now, might need an update later.

ENH Add RWKV default target modules (huggingface#2810)

182f4c9

Orthogonal Subspace Learning: changes for the OSF method

2418375

rebasing to make use of simplified basetuner implementation and adding more experiment results fixing style, quality, etc in the code Make style fixing CI and other test cases

NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 89c3113 to 2418375 Compare October 20, 2025 20:58

fixing couple of test errors

fabbf33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Orthogonal subspace learning #2

Orthogonal subspace learning #2

Uh oh!

NikhilNayak-debug commented Jul 15, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

RobotSail Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

		return optimizer


		def wrap_model_with_svd(model: nn.Module, svd_config: dict[str, int] \| None = None) -> nn.Module:

		return config


		def create_svd_model_class(base_cls: type) -> type:

		dV.copy_(local_dV)


		def auto_generate_target_svd_config(model: nn.Module) -> dict[str, int]:

Orthogonal subspace learning #2

Are you sure you want to change the base?

Orthogonal subspace learning #2

Uh oh!

Conversation

NikhilNayak-debug commented Jul 15, 2025

Summary

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants