Skip to content

Conversation

@NikhilNayak-debug
Copy link
Owner

Summary

  • add svd_utils implementing SVD-based orthogonal subspace learning utilities
  • expose new utilities in public API
  • include a simple round‑trip test for SVD utilities

Testing

  • make quality
  • make style

return optimizer


def wrap_model_with_svd(model: nn.Module, svd_config: dict[str, int] | None = None) -> nn.Module:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be renamed to wrap_model_with_osf as well?

return config


def create_svd_model_class(base_cls: type) -> type:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one may need to be updated as well

"set_peft_model_state_dict",
"shift_tokens_right",
"transpose",
"wrap_model_with_svd",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to update all instances of SVD across the PR

@@ -0,0 +1,39 @@
import torch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems like it may also need to be updated.

import torch.nn as nn

from peft.tuners.tuners_utils import BaseTuner
from peft.utils.osf_utils import (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you updated the module name, but the file itself is still svd_utils.

dV.copy_(local_dV)


def auto_generate_target_svd_config(model: nn.Module) -> dict[str, int]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you might have updated this name to auto_generate_target_osf_config but the change didn't make it into your PR.

__all__ = ["OSFConfig", "OSFModel"]

register_peft_method(
name="osf",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also need to register your method as a new PEFT type in peft.utils.peft_types.PeftType, otherwise this won't work


def auto_generate_target_svd_config(model: nn.Module) -> dict[str, int]:
"""Create a mapping from parameter names to ``top_k`` based on layer size."""
target_patterns = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NikhilNayak-debug NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 2d435a5 to 372a375 Compare September 23, 2025 23:02
githubnemo and others added 20 commits September 24, 2025 17:49
Makes it easier to track rate limiting issues.
- The warning message was missing spaces between sentences.
- Added ' around strings for clarity
- For one warning, which extended another warning, put it at the start
  instead of the end, because the other warning can be quite long,
  leading to users missing the addition

For more context on this warning, see huggingface#2254
- default
- mini
- bat

Results are pretty close to the corresponding experiments with Bone,
which is what we expected.
…ce#2763)

Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same
time, as the API is not quite intuitive and there are some footguns
around trainable parameters.

This question has come up multiple times in the past (for recent
examples, check huggingface#2749 and huggingface#2756). Thus it's a good idea to properly
document this.

---------

Co-authored-by: Steven Liu <[email protected]>
Resolves huggingface#2783.

Most PEFT layers (BaseTunerLayers) expose the in_features and
out_features attributes. Therefore, other packages like diffusers may
expect this attribute to exist. However, there were a few PEFT methods
where these attributes were missing:

- LoHa
- LoKr
- LN Tuning
- Trainable Tokens

The layers of these methods now also expose the attributes.

Implementation

To avoid code duplication, I factored out the whole code block in LoRA
layers that extracts these attributes, since LoRA has the most
exhaustive list of checks. The new utility function has the exact same
functionality and can now be used by other PEFT methods.

I updated the four PEFT methods mentioned above to use this new
function, but I did not update PEFT methods that already handled it, as
there wasn't really a need (they check one or two layer types at most,
so there is little duplication).
Right now, get_model_status() and get_layer_status() only report on
BaseTunerLayers, but it would be helpful if they could also report
auxiliary modules. This PR now includes those.

To facilitate this, a few attributes and methods were added to
AuxiliaryTrainingWrapper and subclasses to make them more similar to
BaseTunerLayer (e.g. the adapter_layer_names attribute). These
attributes and methods were assumed to be present in the code that
determines the model and layer status.
This PR adds the PEFT version to the adapter_config.json. This can be
useful in the future -- for instance when we change the state dict
format of a PEFT method, we can convert it in a backwards compatible way
based on the PEFT version being used. It can also be useful for
debugging by providing an easy way to see the PEFT version that was used
to train a PEFT adapter.

Notes:

In huggingface#2038, we made a change to PEFT configs to make it so that even if
new arguments are added to a config, it can still be loaded with older
PEFT versions (forward compatibility). Before that change, adding the
PEFT version would have been quite disruptive, as it would make all PEFT
configs incompatible with older PEFT versions. Said PR was included in
the 0.14.0 release from Dec 2024, so we can expect the vast majority of
PEFT users to use this version or a more recent one.

If the PEFT version is a dev version, the version tag is ambiguous.
Therefore, I added some code to try to determine the commit hash. This
works if users installed PEFT with git+...@<HASH>. Unit testing that the
function to determine the hash works with these types of installs is not
trivial. Therefore, I just patched the function to return a fixed hash.
I did, however, test it locally and it works:

python -m pip install
git+https://github.com/huggingface/diffusers.git@5e181eddfe7e44c1444a2511b0d8e21d177850a0
python -c "from peft.config import _get_commit_hash; print(_get_commit_hash('diffusers'))"

Also note that I tried to make the retrieval of the hash super robust by
adding a broad try ... except. If there is an error there, e.g. due to a
busted install path, we never want this to fail, but rather just accept
that the hash cannot be determined (we add @unknown in this case).

If users installed a dev version of PEFT in different way, e.g. using git
clone && pip install ., the commit hash will not be detected. I think
this is fine, I really don't want to start shelling out with git just
for this purpose.
Resolves huggingface#2772

Fixes several edge cases with unusual layer names or target modules.

1. As huggingface#2772 stated, if "weight" is part of a layer name, it would be
treated incorrectly when creating the PEFT state_dict.
2. Similarly, when the adapter name itself is part of a layer name.

Some of these errors would pass silently, which is especially bad (e.g.
a weight not being loaded but no error raised).

I also added some tests that were not failing before, but to cover some
yet uncovered cases or to lay out some basic functionality.

While working on this, I also noticed that it was possible to target a
BaseTunerLayer with modules_to_save and trainable_token_indices (e.g.
the lora_A and lora_B nn.Linear would be replaced with
ModulesToSaveWrapper). I don't think this is ever desired, so we now
raise an error if this is detected.
Add `<Tip>`s converted to new syntax to docstrings.

---------

Co-authored-by: nemo <[email protected]>
The reset_sessions function is removed but it's also no longer necessary
to call it for the purpose we used it.

Moreover, the deprecated use_auth_token argument is fully removed now,
so everywhere we used to pass it, it is now removed, unless a user
passes it explicitly.

Also, remove the deprecated local_dir_use_symlinks argument.
Implements the paper "Exploring Sparsity for Parameter Efficient Fine
Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532).

WaveFT enables fine-grained control over the number of trainable
parameters by directly learning a sparse set of coefficients in the
wavelet domain of residual matrices. Experiments show that it works well
in the text-to-image generation space.
When using add_weighted_adapter, so far, there was an implicit
assumption that all weights are positive. This PR allows negative
weights to be passed.

---------

Co-authored-by: Valentin Teutschbein <[email protected]>
jiqing-feng and others added 10 commits October 10, 2025 12:29
A seed was accidentally chosen that results in a test failing with XPU.

Signed-off-by: jiqing-feng <[email protected]>
While memory usage correlates with the number of trainable params, having this number directly
makes it easier to see that methods are using similar numbers of trainable params and outliers
can be inspected easily.
Check if PEFT triggers transformers FutureWarning or DeprecationWarning
by converting these warnings into failures.
This PR adds the set_requires_grad method to PEFT models (both PeftModel
and BaseTuner). As the name suggests, this is a method to set the
requires_grad attribute of the specified PEFT adapters.

For more general context, this is mostly relevant when dealing with
multiple adapters. As is, users can already set the active adapter(s)
with set_adapter, which automatically adjust the requires_grad attribute
too, so that only the active adapters will have grads enabled. However,
there can be situations where activity status and requires grad may
differ. Right now, users would need to manually set requires_grad to
deal with that, which is error prone (e.g. forgetting modules_to_save).
This PR closes this gap in the API.

As this functionality is quite general purpose, I added a
set_requires_grad function to functional.py for easier integration.

Note: The set_requires_grad method will raise an error when called with
prompt learning methods like prompt tuning. This is because these
methods don't have a universal base class (BaseTuner and BaseTunerLayer)
that would allow to add this API. Moreover, they only support a single
adapter at a time, hence there is not much need to have this method in
the first place.

A side effect of not supporting prompt learning is that on the
PeftModel, we are free to allow set_requires_grad to accept more than
one adapter, which would normally be difficult, because prompt learning
only allows one adapter.
A new initialization method was added to prompt tuning in huggingface#2815. This PR
adds an experiment config for this method to the MetaMathQA benchmark.

Testing locally, this got a test accuracy of 36%, compared to 25% with
random initialization.
Resolves huggingface#2809

Some models like Gemma3 apply a scalar to the embedding output. It needs
to be taken into account when using trainable tokens or LoRA applied to
the embedding layer.
This is to fix an oversight from huggingface#2797, where the LoftQ test was
sligthly refactored but one test was not updated accordingly.
@NikhilNayak-debug NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 372a375 to 00073fe Compare October 15, 2025 15:21
shantanugupta2004 and others added 6 commits October 16, 2025 14:59
Note: Diffusers is left as is for now, might need an update later.
The "LoRA Without Regret" blog
post (https://thinkingmachines.ai/blog/lora/) mentions that targeting
the MLP part of the transformer is more effective than targeting the
attention modules. This experiment tests this by targeting:

["gate_proj", "up_proj", "down_proj"]

instead of the default layers (["q_proj", "v_proj"]).

I chose a rank to match the parameter count we would get when targeting
the attention modules with rank 32, which is rank 10. Testing on my
machine, there is indeed a nice improvement in the test score:

| metric               | target attention | target MLP |
|----------------------|------------------|------------|
| test accuracy        | 48.2%            | 51.3%      |
| # trainable params   | 9175040          | 9461760    |
| peak memory reserved | 20.74 GB         | 23.02 GB   |

There is, however, also a marked increase in memory usage, despite
matching parameter count. Since the operations are different, this may
not be a surprise, but let's wait for the final verdict once this
experiment runs on our AWS instance.

Note: I also tested higher and lower ranks when targeting the MLP. The
effect on memory usage was negligible, but it did improve the score:

| metric             | rank 8  | rank 10 | rank 12  | rank 32  |
|--------------------|---------|---------|----------|----------|
| test accuracy      | 50.3%   | 51.3%   | 52.2%    | 54.8%    |
| # trainable params | 7569408 | 9461760 | 11354112 | 30277632 |

In the end, I chose only to add the rank 10 experiment to match the
number of trainable parameters.
Implements DeLoRA: "Decoupling Angles and Strength in Low-rank
Adaptation" (https://huggingface.co/papers/2503.18225).

Similar to DoRA, DeLoRA decouples the angular learning from the
adaptation strength, but it also allows to limit the norm of the change.
This way, DeLoRA promises to reduce the risk of catastrophic forgetting
and to be more robust to hyper-parameter settings such as the learning
rate.
Adds an option to the LoRA config, ensure_weight_tying, which, if
enabled, ensures that if the embedding and LM head are tied, they share
the ModulesToSaveWrapper. This ensures that their weights work correctly
even after merging them.
rebasing to make use of simplified basetuner implementation and adding more experiment results

fixing style, quality, etc in the code

Make style

fixing CI and other test cases
@NikhilNayak-debug NikhilNayak-debug force-pushed the orthogonal-subspace-learning branch from 89c3113 to 2418375 Compare October 20, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.