Skip to content

Conversation

@leofillioux
Copy link

@leofillioux leofillioux commented Dec 11, 2025

This PR is a continuation of issue #2948, for which we proposed the integration of the PVeRA adapter.

As recommended in the issue, we based our implementation on the implementation of the VeRA adapter, as both adapters are very close. Here are a list of the contributions from this PR.

  • Added the PVeRA adapter folder in src/peft, and adapted the config.py, layer.py, and model.py files from the ones from VeRA.
  • Added some documentation.
  • Adapted the tests/test_vera.py to tests/test_pvera.py and made sure this ran properly.

Note: because I'm running on a Mac, I was not able to run make test (had an error with MPS).

@BenjaminBossan could you please give me some feedback?

@leofillioux
Copy link
Author

Update : ran make test on CPU and got 16496 passed, 3563 skipped, 10 xfailed, 13272 warnings on leofillioux:main, for comparison with the huggingface:main branch I got 16476 passed, 3371 skipped, 10 xfailed, 13272 warnings.

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this already looks quite good!

Please check the copyright notices and make sure that it is up-to-date (it often says 2024 but it should be 2025).

I've done a quick review and left a few comments. General remarks:

  • let's add PVeRA to tests/test_custom_models.py (adding the important configurations to TEST_CASES, similar to VeRA)
  • if these pass we can extend the coverage by adding PVeRA to tests/test_decoder_models.py and tests/test_encoder_decoder_models.py

Once these are implemented I'll do a more thorough review. After that it'd be nice to have a runnable example and to integrate it into the method comparison to benchmark it against the other methods (and to check our expectations).

Heads up: I'll be rather off than on in the next days, so merging and review will most likely happen in the next year.

@leofillioux
Copy link
Author

Hello @githubnemo, thank you for your review! I added two commits:

  • dd19a9a which addresses the comments made above.
  • d52cc76 which adds PVeRA to tests/test_custom_models.py, tests/test_decoder_models.py and tests/test_encoder_decoder_models.py. This required a few changes to other files (e.g. adding the pvera/bnb.py file, fixing the merging of adapter, ...).

After all these changes, I re-ran make test and got 16920 passed, 3643 skipped, 10 xfailed, 13582 warnings.
No worries, enjoy your time off!

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the update!

Implementation and tests look quite mature. I think that if we provide bitsandbytes support we also need at least one test in tests/test_gpu_examples.py for quality control. I think test_causal_lm_training_4bit_vera could be used as a base.

In general I think we should rename PVeRA* to Pvera* (e.g., PVeRAModel -> PveraModel) to be consistent with VeraModel and friends. It is quite hard to remember the spelling of the various abbreviations already :)

Rest of the review is in the comments.

After the comments are resolved and the CI is green I think it would be nice to integrate PVeRA into the MetaMathQA benchmark by adding an experiment file based on the VeRA experiment.

Edit: To fix the docs build, add the PVeRA entry to docs/source/_toctree.yml similar to the other entries.

super(nn.Linear, self).__init__()
PVeRALayer.__init__(self, base_layer, **kwargs)
self.fan_in_fan_out = fan_in_fan_out
self.sample_at_inference = sample_at_inference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency I think this should be a dict so that I can toggle this behavior for each adapter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, are you talking about sample_at_inference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.

elif issubclass(config_cls, (VBLoRAConfig, RandLoraConfig, OSFConfig)):
lr = 0.01 # otherwise we get nan
elif issubclass(config_cls, PVeRAConfig): # needs very small lr to not get nan
lr = 1e-6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed very small - do you have an idea which gradients are exploding and if this is a problem in practice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, the problems come from the pvera_lambda_b which has its parameters at nan after the third epoch. This seems to be caused by the input which has values up to 90, which in practice shouldn't occur since the inputs are usually normalized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's extend the comment to be more precise:

# needs a very small lr to not get nan in pvera_lambda_b due to high input values in this test (up to 90)

model = self.transformers_class.from_pretrained(model_id)
model = get_peft_model(model, config)
model = model.to(self.torch_device)
model = model.to(self.torch_device).eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong since this is a training test - why put the model in eval mode?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, it doesn't make sense to put .eval(). However there is a problem in that model is in training mode here, whereas model_from_pretrained is is eval mode. This means that for PVeRA one model will sample from the learned distribution and the other will not, therefore giving different results, which is why I had put them both in eval mode. Do you think we should maybe just remove this test for PVeRA?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's check if we can make this consistent by setting the model to eval mode after getting the loss (i.e. in the context block where the comparison happens) and moving the logits = ... retrieval down there.

For example:

            with tempfile.TemporaryDirectory() as tmp_dirname:
                model.eval()
                logits = model(**inputs)[0][0]

                model.save_pretrained(tmp_dirname)

                model_from_pretrained = self.transformers_class.from_pretrained(model_id)
                model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(
                    self.torch_device
                )

                logits_from_pretrained = model_from_pretrained(**inputs)[0][0]
                atol, rtol = 1e-4, 1e-4
                assert torch.allclose(logits, logits_from_pretrained, atol=atol, rtol=rtol)

IMO this should not change the test for the worse and should not break anything but we'll see :)

Comment on lines 1145 to 1146
model_from_pretrained = (
PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(self.torch_device).eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@leofillioux
Copy link
Author

Hello @githubnemo, thanks again for your review. I have updated the code with most of your comments, but the following issues still remain:

  • A question regarding what should be converted to a dict in your comment here.
  • Regarding the small learning rate for PVeRA in a test, it seems to be due to a non-normalised input, which seems to cause instability in the sampling, I added a bit more info here.
  • Regarding adding .eval() to some tests during training. I see why this is problematic, so I have removed them, but as explained here, this is a problem for PVeRA as this test compared the outputs of a model in train with a model in eval model. What do you think of removing this test for PVeRA?

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

I've commented on hopefully all the outstanding issues and added a few nits.

CI seems to be passing except for the .eval issue which is hopefully resolved with the proposed fix.

from .model import PveraModel


__all__ = ["Linear", "PVeRALayer", "PveraConfig", "PveraModel"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__all__ = ["Linear", "PVeRALayer", "PveraConfig", "PveraModel"]
__all__ = ["Linear", "PveraLayer", "PveraConfig", "PveraModel"]

+for the class itself as well

super(nn.Linear, self).__init__()
PVeRALayer.__init__(self, base_layer, **kwargs)
self.fan_in_fan_out = fan_in_fan_out
self.sample_at_inference = sample_at_inference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.

elif issubclass(config_cls, (VBLoRAConfig, RandLoraConfig, OSFConfig)):
lr = 0.01 # otherwise we get nan
elif issubclass(config_cls, PVeRAConfig): # needs very small lr to not get nan
lr = 1e-6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's extend the comment to be more precise:

# needs a very small lr to not get nan in pvera_lambda_b due to high input values in this test (up to 90)

model = self.transformers_class.from_pretrained(model_id)
model = get_peft_model(model, config)
model = model.to(self.torch_device)
model = model.to(self.torch_device).eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's check if we can make this consistent by setting the model to eval mode after getting the loss (i.e. in the context block where the comparison happens) and moving the logits = ... retrieval down there.

For example:

            with tempfile.TemporaryDirectory() as tmp_dirname:
                model.eval()
                logits = model(**inputs)[0][0]

                model.save_pretrained(tmp_dirname)

                model_from_pretrained = self.transformers_class.from_pretrained(model_id)
                model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(
                    self.torch_device
                )

                logits_from_pretrained = model_from_pretrained(**inputs)[0][0]
                atol, rtol = 1e-4, 1e-4
                assert torch.allclose(logits, logits_from_pretrained, atol=atol, rtol=rtol)

IMO this should not change the test for the worse and should not break anything but we'll see :)

Comment on lines +109 to +139
def test_multiple_adapters_save_load_save_projection_true(self, mlp_same_prng, tmp_path):
# check saving and loading works with multiple adapters and saved projection weights
torch.manual_seed(0)
input = torch.randn(5, 10)
mlp_same_prng.set_adapter("default")
mlp_same_prng.eval()
output_default = mlp_same_prng(input)
mlp_same_prng.set_adapter("other")
output_other = mlp_same_prng(input)

# sanity check
assert not torch.allclose(output_default, output_other, atol=1e-3, rtol=1e-3)

save_path = tmp_path / "pvera"
mlp_same_prng.save_pretrained(save_path)
assert os.path.exists(save_path / "adapter_config.json")
assert os.path.exists(save_path / "other" / "adapter_config.json")

torch.manual_seed(0)
mlp = MLP()
peft_model = PeftModel.from_pretrained(mlp, save_path)
peft_model.load_adapter(save_path / "other", "other")
peft_model.eval()

peft_model.set_adapter("default")
output_default_loaded = peft_model(input)
peft_model.set_adapter("other")
output_other_loaded = peft_model(input)

assert torch.allclose(output_default, output_default_loaded, atol=1e-3, rtol=1e-3)
assert torch.allclose(output_other, output_other_loaded, atol=1e-3, rtol=1e-3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_multiple_adapters_save_load_save_projection_true probably already covered by the common tests or am I missing something special?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I integrated these (test_multiple_adapters_save_load_save_projection_true and test_multiple_adapters_save_projection_true_contains_pvera_A_pvera_B) because they were integrated for VeRA, but I agree that they do seem a bit redundant. I'm ok with removing them if you think they aren't useful.

Comment on lines 223 to 229
"r": 8,
"target_modules": None,
"pvera_dropout": 0.05,
"d_initial": 0.1,
"save_projection": True,
"bias": "none",
"task_type": "SEQ_2_SEQ_LM",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just mention the non-default values here.

"d_initial": 0.1,
"save_projection": True,
"bias": "none",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, just mention the non-default values.

},
)
pvera_dropout: float = field(default=0.0, metadata={"help": "PVeRA dropout"})
d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."})
d_initial: float = field(default=0.1, metadata={"help": "Initial value for d vector."})

But we can just use the docstring for this parameter from above verbatim

Comment on lines 81 to 83
"List of module names or regex expression of the module names to replace with PVeRA."
"For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. "
"Only linear layers are supported."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the docstring values from above for all help values. In the end this makes it more maintainable and we're losing not much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to add this file to the docs/source/_toctree.yml possibly next to VeRA.

@leofillioux
Copy link
Author

Hello @githubnemo,
Thanks again for your review! I think I have addressed the issues with my newest commits. The only thing I am unsure about is regarding the redundant tests, for which I have left a comment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants