-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Various CUDA graph improvements on capture time, replay time, memory footprint #2572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Various CUDA graph improvements on capture time, replay time, memory footprint #2572
Conversation
0cf9ab8 to
2b8f924
Compare
d86d6db to
9f268c1
Compare
9f268c1 to
8b0fe08
Compare
|
Can we throw an error or at least a warning if |
5ad199d to
529107a
Compare
mathemakitten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me % a pass for docstrings and some comments on clarity for the new CudagraphArtifacts class.
| config: TransformerConfig, | ||
| base_module = None, | ||
| function_name = None, | ||
| need_backward = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed when we have base_module.training and/or torch.is_grad_enabled() available in the same scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for when you want to capture something that doesnt have a backward pass but still is run inside a training loop. For instance, we might want to graph over a log_grad_norms or something. Hypothetically there isn't a backward pass to graph over. So this is a flag to do that, without having to set base_module.training and/or torch.is_grad_enabled() everytime we call that function
mathemakitten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me % a pass for docstrings and some comments on clarity for the new CudagraphArtifacts class.
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jieming Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
825453d to
f275189
Compare
|
/ok to test f275189 |
Signed-off-by: Jimmy Zhang <[email protected]>
6ea46c5 to
6f13ce8
Compare
|
/ok to test 6f13ce8 |
What does this PR do ?
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
[email protected]or[email protected].Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.