Skip to content

Conversation

@yeonjoon-jung01
Copy link

@yeonjoon-jung01 yeonjoon-jung01 commented Oct 20, 2025

Summary

We opened the initial PR for the GraLoRA method (a granular low-rank adaptation that improves expression power and outlier handling, selected as a NeurIPS 2025 Spotlight), based on #2636

Test Codes

pytest tests/test_gralora.py -v
pytest tests/test_config.py -k "Gralora" -v
pytest tests/test_decoder_models.py -k "Gralora" -v
pytest tests/test_custom_models.py -k "Gralora" -v

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing GraLoRA to PEFT. The method looks interesting and the implementation generally looks good.

I have added a bunch of comments, but many of these are just due to your fork being a bit older. We have simplified PEFT now so that you can remove a bunch of code, I have marked the code that can be deleted.

Apart from the comments that I added, to complete this PR, let's work on:

  1. Extend tests: Add tests to test_custom_models.py, test_encoder_decoder_models.py, test_feature_extraction_models.py, and test_seq_classifier.py
  2. Also, let's add documentation and ideally also at least one example.
  3. Optional, but highly recommended: Add an experiment to our PEFT method comparison suite.

@BenjaminBossan
Copy link
Member

@yeonjoon-jung01 Please ping me when you're finished so that I know that I can give this another review. Also, if possible, please avoid force pushes or rebases, as those make reviews harder.

@yeonjoon-jung01
Copy link
Author

@yeonjoon-jung01 Please ping me when you're finished so that I know that I can give this another review. Also, if possible, please avoid force pushes or rebases, as those make reviews harder.

@BenjaminBossan I’ve finished updating the code 🙂. I saw your message a bit late — I had already rebased the branch to sync with the main stream, just in case there might be any conflicts. I’ll make sure to avoid force pushes or rebases from now on. Sorry about that!

@yeonjoon-jung01
Copy link
Author

@BenjaminBossan I have also resolved the previously missed features.

I’ve extended the test coverage to include test_custom_models.py, test_encoder_decoder_models.py, test_feature_extraction_models.py, and test_seq_classifier.py.
Additionally, I’ve added corresponding documentation and example code.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates to the PR. I did another review round, please check.

Also, before committing your changes, please call make style. Ensure that you have the correct version of ruff installed (0.12.12).

@yeonjoon-jung01
Copy link
Author

@BenjaminBossan I’ve resolved all of your comments and applied the suggested changes. The main update is that I removed tests/test_gralora.py and integrated the related test cases into the existing test_initialization and test_custom_models files, including additional scenarios for Hybrid GraLoRA.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Member

@yeonjoon-jung01 Could you please run make style?

@yeonjoon-jung01
Copy link
Author

@BenjaminBossan I have run make style updated the documentations

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. A test is failing. Check my comment on possible solutions.

@yeonjoon-jung01
Copy link
Author

@BenjaminBossan Do you think there’s any additional code I should test or update?

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, the change looks good.

I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.

Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

per_device_eval_batch_size=batch_size,
warmup_steps=100,
weight_decay=0.01,
logging_dir="./logs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument errors for me, does it work for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked for me! I guess it’s a version dependency issue. I used transformers==4.57.1, and the error did not occur.

parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate")
parser.add_argument("--cutoff_len", type=int, default=512, help="Cutoff length for tokenization")
parser.add_argument("--val_set_size", type=int, default=500, help="Validation set size")
parser.add_argument("--quantize", action="store_true", help="Use quantization")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantizing wouldn't work, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked that quantization works fine with the GraLoRA method.

@yeonjoon-jung01
Copy link
Author

Thanks for the updates, the change looks good.

I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.

Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

@BenjaminBossan Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

@yeonjoon-jung01
Copy link
Author

Thanks for the updates, the change looks good.
I focused on the examples this time and found a few issues. Some are possibly due to some recent changes in transformers, not sure, but we should update them so that the examples run out of the box.
Moreover, I ran an experiment with GraLoRA on the PEFT MetaMath benchmark. I used the default settings for the config and in general, the results compare to LoRA rank 32, with similar memory usage and training time. However, the final test accuracy fell slightly short, attaining 46.2% compared to 48.2% with LoRA. If you have any suggestion for better GraLoRA hyper-parameters for this experiment, feel free to check them in as an experiment. Otherwise, we can also work the defaults.

@BenjaminBossan Could you please try with learning rate 2e-4 instead of the default 1e-4 for GraLoRA?

@BenjaminBossan I’ve tested the method on my side with both rank 32 and 64, and in both cases, it achieved higher performance than LoRA. If you’d like, I can also commit the configuration and result JSON files, or you’re welcome to try reproducing it on your end.
I used a learning rate of 2e-4 for both cases, as GraLoRA is more robust to outliers, making a higher learning rate manageable. For the other settings, I used rank 32 with alpha 64 and rank 64 with alpha 128. All other parameters were kept at their default values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants