-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Integration of PVeRA #2952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integration of PVeRA #2952
Conversation
…ation, and the tests.
|
Update : ran |
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, this already looks quite good!
Please check the copyright notices and make sure that it is up-to-date (it often says 2024 but it should be 2025).
I've done a quick review and left a few comments. General remarks:
- let's add PVeRA to
tests/test_custom_models.py(adding the important configurations toTEST_CASES, similar to VeRA) - if these pass we can extend the coverage by adding PVeRA to
tests/test_decoder_models.pyandtests/test_encoder_decoder_models.py
Once these are implemented I'll do a more thorough review. After that it'd be nice to have a runnable example and to integrate it into the method comparison to benchmark it against the other methods (and to check our expectations).
Heads up: I'll be rather off than on in the next days, so merging and review will most likely happen in the next year.
|
Hello @githubnemo, thank you for your review! I added two commits:
After all these changes, I re-ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the update!
Implementation and tests look quite mature. I think that if we provide bitsandbytes support we also need at least one test in tests/test_gpu_examples.py for quality control. I think test_causal_lm_training_4bit_vera could be used as a base.
In general I think we should rename PVeRA* to Pvera* (e.g., PVeRAModel -> PveraModel) to be consistent with VeraModel and friends. It is quite hard to remember the spelling of the various abbreviations already :)
Rest of the review is in the comments.
After the comments are resolved and the CI is green I think it would be nice to integrate PVeRA into the MetaMathQA benchmark by adding an experiment file based on the VeRA experiment.
Edit: To fix the docs build, add the PVeRA entry to docs/source/_toctree.yml similar to the other entries.
| super(nn.Linear, self).__init__() | ||
| PVeRALayer.__init__(self, base_layer, **kwargs) | ||
| self.fan_in_fan_out = fan_in_fan_out | ||
| self.sample_at_inference = sample_at_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency I think this should be a dict so that I can toggle this behavior for each adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, are you talking about sample_at_inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.
| elif issubclass(config_cls, (VBLoRAConfig, RandLoraConfig, OSFConfig)): | ||
| lr = 0.01 # otherwise we get nan | ||
| elif issubclass(config_cls, PVeRAConfig): # needs very small lr to not get nan | ||
| lr = 1e-6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed very small - do you have an idea which gradients are exploding and if this is a problem in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, the problems come from the pvera_lambda_b which has its parameters at nan after the third epoch. This seems to be caused by the input which has values up to 90, which in practice shouldn't occur since the inputs are usually normalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's extend the comment to be more precise:
# needs a very small lr to not get nan in pvera_lambda_b due to high input values in this test (up to 90)
tests/testing_common.py
Outdated
| model = self.transformers_class.from_pretrained(model_id) | ||
| model = get_peft_model(model, config) | ||
| model = model.to(self.torch_device) | ||
| model = model.to(self.torch_device).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong since this is a training test - why put the model in eval mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, it doesn't make sense to put .eval(). However there is a problem in that model is in training mode here, whereas model_from_pretrained is is eval mode. This means that for PVeRA one model will sample from the learned distribution and the other will not, therefore giving different results, which is why I had put them both in eval mode. Do you think we should maybe just remove this test for PVeRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's check if we can make this consistent by setting the model to eval mode after getting the loss (i.e. in the context block where the comparison happens) and moving the logits = ... retrieval down there.
For example:
with tempfile.TemporaryDirectory() as tmp_dirname:
model.eval()
logits = model(**inputs)[0][0]
model.save_pretrained(tmp_dirname)
model_from_pretrained = self.transformers_class.from_pretrained(model_id)
model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(
self.torch_device
)
logits_from_pretrained = model_from_pretrained(**inputs)[0][0]
atol, rtol = 1e-4, 1e-4
assert torch.allclose(logits, logits_from_pretrained, atol=atol, rtol=rtol)IMO this should not change the test for the worse and should not break anything but we'll see :)
tests/testing_common.py
Outdated
| model_from_pretrained = ( | ||
| PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(self.torch_device).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
Hello @githubnemo, thanks again for your review. I have updated the code with most of your comments, but the following issues still remain:
|
githubnemo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
I've commented on hopefully all the outstanding issues and added a few nits.
CI seems to be passing except for the .eval issue which is hopefully resolved with the proposed fix.
src/peft/tuners/pvera/__init__.py
Outdated
| from .model import PveraModel | ||
|
|
||
|
|
||
| __all__ = ["Linear", "PVeRALayer", "PveraConfig", "PveraModel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| __all__ = ["Linear", "PVeRALayer", "PveraConfig", "PveraModel"] | |
| __all__ = ["Linear", "PveraLayer", "PveraConfig", "PveraModel"] |
+for the class itself as well
| super(nn.Linear, self).__init__() | ||
| PVeRALayer.__init__(self, base_layer, **kwargs) | ||
| self.fan_in_fan_out = fan_in_fan_out | ||
| self.sample_at_inference = sample_at_inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I always forget that github doesn't highlight the commented lines. Yes, I was talking about sample_at_inference.
| elif issubclass(config_cls, (VBLoRAConfig, RandLoraConfig, OSFConfig)): | ||
| lr = 0.01 # otherwise we get nan | ||
| elif issubclass(config_cls, PVeRAConfig): # needs very small lr to not get nan | ||
| lr = 1e-6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's extend the comment to be more precise:
# needs a very small lr to not get nan in pvera_lambda_b due to high input values in this test (up to 90)
tests/testing_common.py
Outdated
| model = self.transformers_class.from_pretrained(model_id) | ||
| model = get_peft_model(model, config) | ||
| model = model.to(self.torch_device) | ||
| model = model.to(self.torch_device).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let's check if we can make this consistent by setting the model to eval mode after getting the loss (i.e. in the context block where the comparison happens) and moving the logits = ... retrieval down there.
For example:
with tempfile.TemporaryDirectory() as tmp_dirname:
model.eval()
logits = model(**inputs)[0][0]
model.save_pretrained(tmp_dirname)
model_from_pretrained = self.transformers_class.from_pretrained(model_id)
model_from_pretrained = PeftModel.from_pretrained(model_from_pretrained, tmp_dirname).to(
self.torch_device
)
logits_from_pretrained = model_from_pretrained(**inputs)[0][0]
atol, rtol = 1e-4, 1e-4
assert torch.allclose(logits, logits_from_pretrained, atol=atol, rtol=rtol)IMO this should not change the test for the worse and should not break anything but we'll see :)
| def test_multiple_adapters_save_load_save_projection_true(self, mlp_same_prng, tmp_path): | ||
| # check saving and loading works with multiple adapters and saved projection weights | ||
| torch.manual_seed(0) | ||
| input = torch.randn(5, 10) | ||
| mlp_same_prng.set_adapter("default") | ||
| mlp_same_prng.eval() | ||
| output_default = mlp_same_prng(input) | ||
| mlp_same_prng.set_adapter("other") | ||
| output_other = mlp_same_prng(input) | ||
|
|
||
| # sanity check | ||
| assert not torch.allclose(output_default, output_other, atol=1e-3, rtol=1e-3) | ||
|
|
||
| save_path = tmp_path / "pvera" | ||
| mlp_same_prng.save_pretrained(save_path) | ||
| assert os.path.exists(save_path / "adapter_config.json") | ||
| assert os.path.exists(save_path / "other" / "adapter_config.json") | ||
|
|
||
| torch.manual_seed(0) | ||
| mlp = MLP() | ||
| peft_model = PeftModel.from_pretrained(mlp, save_path) | ||
| peft_model.load_adapter(save_path / "other", "other") | ||
| peft_model.eval() | ||
|
|
||
| peft_model.set_adapter("default") | ||
| output_default_loaded = peft_model(input) | ||
| peft_model.set_adapter("other") | ||
| output_other_loaded = peft_model(input) | ||
|
|
||
| assert torch.allclose(output_default, output_default_loaded, atol=1e-3, rtol=1e-3) | ||
| assert torch.allclose(output_other, output_other_loaded, atol=1e-3, rtol=1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_multiple_adapters_save_load_save_projection_true probably already covered by the common tests or am I missing something special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I integrated these (test_multiple_adapters_save_load_save_projection_true and test_multiple_adapters_save_projection_true_contains_pvera_A_pvera_B) because they were integrated for VeRA, but I agree that they do seem a bit redundant. I'm ok with removing them if you think they aren't useful.
| "r": 8, | ||
| "target_modules": None, | ||
| "pvera_dropout": 0.05, | ||
| "d_initial": 0.1, | ||
| "save_projection": True, | ||
| "bias": "none", | ||
| "task_type": "SEQ_2_SEQ_LM", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just mention the non-default values here.
| "d_initial": 0.1, | ||
| "save_projection": True, | ||
| "bias": "none", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below, just mention the non-default values.
src/peft/tuners/pvera/config.py
Outdated
| }, | ||
| ) | ||
| pvera_dropout: float = field(default=0.0, metadata={"help": "PVeRA dropout"}) | ||
| d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| d_initial: float = field(default=0.1, metadata={"help": "Initial init value for d vector."}) | |
| d_initial: float = field(default=0.1, metadata={"help": "Initial value for d vector."}) |
But we can just use the docstring for this parameter from above verbatim
src/peft/tuners/pvera/config.py
Outdated
| "List of module names or regex expression of the module names to replace with PVeRA." | ||
| "For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$'. " | ||
| "Only linear layers are supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the docstring values from above for all help values. In the end this makes it more maintainable and we're losing not much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make sure to add this file to the docs/source/_toctree.yml possibly next to VeRA.
|
Hello @githubnemo, |
This PR is a continuation of issue #2948, for which we proposed the integration of the PVeRA adapter.
As recommended in the issue, we based our implementation on the implementation of the VeRA adapter, as both adapters are very close. Here are a list of the contributions from this PR.
config.py,layer.py, andmodel.pyfiles from the ones from VeRA.tests/test_vera.pytotests/test_pvera.pyand made sure this ran properly.Note: because I'm running on a Mac, I was not able to run
make test(had an error with MPS).@BenjaminBossan could you please give me some feedback?