-
Notifications
You must be signed in to change notification settings - Fork 706
Remove explicit device arguments #14619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14619
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit d369723 with merge base deb42f2 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
examples/models/llama/model.py
Outdated
| # Within the device="meta" context, tensors that are created do not carry data. | ||
| # They possess all other metadata a tensor carries such as size, stride, requires_grad. | ||
| with torch.device("meta"): | ||
| with torch.device("cpu"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @navsud, could you provide some more context on this?
Update: This was failing multiple export tests, as the Llama2Model (at llama/model.py) was instantiating the transformer with "meta" device, which needed the rope params to be explicitly instantiated on "cpu" device. Changed "meta" to "cpu" to fix this issue.
Using torch.device("meta") provides some memory benefits during export because we do not load all the tensors into memory until necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full context is: We use the same rope implementation for QAT of HTP model using pt2e flow. But since "cpu" was hardcoded for Rope, it was a bit messy to move to GPU (especially when we are doing multi-GPU training).
So, I removed "device" argument in rope implementation.
However, it was causing fails in export, because - during export models are built with "meta" device, and then during checkpoint load the weights are "assigned". However "rope" freq params are not present in the checkpoint, hence not loaded, causing the fail in the export.
So, I'm creating the model directly on "cpu" thus rope freq params are also created on the cpu, thus avoiding the need for explicit hardcoding of "cpu" only for rope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using torch.device("meta") provides some memory benefits during export because we do not load all the tensors into memory until necessary.
Well, it is really a very short-term benefit in the export flow, because right after the model is built with "meta" here, we are either (a) loading weights from checkpoint - at which point the model takes memory here or (b) initializing the model weights with zeros - which also takes memory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context @navsud.
Noob q: if the device is set to cpu when creating the model, does this cause issues when moving to GPU later?
I'm concerned we're regressing export performance; do you see memory/latency regressions at export with this change? FWIW, we were able to see significant savings when using device=meta: D54871495
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucylq
Updated to an implementation that doesn't disturb the way export works (so all of the memory/latency benefits with device=meta is preserved), but still works for my usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @navsud, appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucylq
Had to make another revision as some other tests were failing. Please do another quick round of review. Thanks.
9c5ccc2 to
b04778f
Compare
Summary: Pull Request resolved: pytorch#14619 As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to create rope params on "cuda" if it is run on GPU machine. Differential Revision: D82239525
b04778f to
37ee087
Compare
Summary: As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to create rope params on "cuda" if it is run on GPU machine. Differential Revision: D82239525
Summary: As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to switch between "cpu" vs. "cuda" based on the usecase. Reviewed By: billmguo Differential Revision: D82239525
37ee087 to
d369723
Compare
Summary: As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to switch between "cpu" vs. "cuda" based on the usecase. Reviewed By: billmguo Differential Revision: D82239525
Summary: As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to switch between "cpu" vs. "cuda" based on the usecase. Reviewed By: billmguo Differential Revision: D82239525
Summary:
As part of enabling QAT for HTP model, we need to run QAT on the model that we use during export. Currently Rope is explicitly hardcoded to "cpu". This change enables us to create rope params on "cuda" if it is run on GPU machine.
Differential Revision: D82239525