-
Notifications
You must be signed in to change notification settings - Fork 301
[Performance] Disable LM Head #2081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the memory efficiency and robustness of the model calibration process. By introducing a smart mechanism to conditionally offload the Language Model (LM) head to a meta device, it mitigates Out-Of-Memory issues that can arise during calibration, particularly when using large batch sizes. This ensures that the LM head is only active when a specific modifier requires its calibration, leading to more optimized resource utilization and a smoother calibration experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization to disable the lm_head during calibration to prevent out-of-memory errors, which is a significant improvement for handling large batch sizes. The implementation is well-structured, adding new utility functions and integrating them into the calibration pipelines. I've identified one potential high-severity issue in the requires_lm_head_calibration function that could lead to incorrect calibration behavior. My review includes a specific comment and a code suggestion to address this.
9536dad to
d4b3e7a
Compare
| return input.to("meta") @ dummy_weight.T | ||
|
|
||
| with patch_attr(lm_head, "forward", dummy_forward.__get__(lm_head)): | ||
| yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still getting used to the code paradigms here, this nested context manager design is an interesting approach. I'm pretty sure i had similar situations in torchao and I never would have considered this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I love the context manager pattern, as it makes it reminds implementers to clean up any side effects which might affect downstream code.
HDCharles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
34814c7 to
6559de0
Compare
Purpose
Prerequisites
Changes
disable_lm_headreplaces the module forward with a dummy forward function which returns meta outputscalibration_forward_contextTesting
test_disable_lm_headtest_calibration_forward_context