-
Notifications
You must be signed in to change notification settings - Fork 688
Add LoRA linear definition #11044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LoRA linear definition #11044
Conversation
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11044
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 180fe7f with merge base 8da2ea6 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D73953776 |
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) ghstack-source-id: 285402969 Pull Request resolved: pytorch/executorch#11044
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D73953776 |
# Static attention constructs ModuleLists for qkvo and | ||
# populates them with MHA attention layers; do not pass in | ||
# wq, wk, wv, wo. | ||
attention = cls(model_args, layer_id, rope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this in llama_transformer at all? You have model args passed to the constructor couldnt you just construct these tensors in the constructor rather then passing them to init?
Not having to add attention specific logic to llama_transformer.py is the intention of the attention registry, so if we have to break that abstraction I think we need to elaborate why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is to make building a transformer composable, e.g., you can pass in any type of attention, with any type of linears.
If construct them inside the transformer constructor, then the transformer has to understand what all the different options are. i.e., we didn't want to put LoRA logic inside MHA attention itself, as it isn't tied to MHA specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attention already has to understand what all the different options are. Thats why model args is passed into it no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attention already has to understand what all the different options are. Thats why model args is passed into it no?
As in its already possible to generate invalid configs where someone sticks model options in and then uses an attention that doesnt support those options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoRA specifically doesn't have to be coupled with the attention implementation (eg. MHA). I can see why it makes sense to do it inside if other optimizations are also applied within eg. MHA though ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JacobSzwejbka my original concern on this is that model config might be non-standardized. Now that we have Jack's new config system maybe this is less of a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set up is already dependent on model config though. The lora constructors use tons of args from the config
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
Pull Request resolved: #11044 ^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). ghstack-source-id: 297800340 @exported-using-ghexport Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/)
This pull request was exported from Phabricator. Differential Revision: D73953776 |
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D73953776 |
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D73953776 |
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
Pull Request resolved: #11044 ^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). ghstack-source-id: 298402065 @exported-using-ghexport Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/)
This pull request was exported from Phabricator. Differential Revision: D73953776 |
^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/) [ghstack-poisoned]
Pull Request resolved: #11044 ^ Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in. If none, construct linear (current behaviour). ghstack-source-id: 298411928 @exported-using-ghexport Differential Revision: [D73953776](https://our.internmc.facebook.com/intern/diff/D73953776/)
This pull request was exported from Phabricator. Differential Revision: D73953776 |
db8729a
into
gh/lucylq/82/base
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11044 by @lucylq ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/lucylq/82/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/lucylq/82/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/lucylq/82/orig @diff-train-skip-merge Co-authored-by: lucylq <[email protected]>
Stack from ghstack (oldest at bottom):
^
Add lora linear definition. Pull out linears from attention, and allow custom linear (eg. lora linear) to be passed in.
If none, construct linear (current behaviour).
Differential Revision: D73953776