-
Notifications
You must be signed in to change notification settings - Fork 22
Current scaling: two-stage HIP amax kernel #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
c15d93b
Current scaling: two-stage amax kernel
matthiasdiener 51fab36
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener ae35e4c
bugfix graph capture
matthiasdiener 77a68a7
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener c0d8e73
outline workspace allocation
matthiasdiener 6c3507d
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener 3c9de07
Proper allocation of workspace
matthiasdiener 91249cc
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener be0e0c8
add a test to compare the accuracy of both amax implementations
matthiasdiener bce34da
add possibility to force using previous (atomic) kernel
matthiasdiener 8c388cc
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener 6388604
add copyrights
matthiasdiener 9e6586f
don't add extra template to kernel
matthiasdiener 18292bf
make amax_kernel_threads usable in pytorch
matthiasdiener a389455
update remaining calls to nvte_compute_amax
matthiasdiener d87ab8a
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener fd5dead
additional copyrights
matthiasdiener 16d3bf9
avoid workspace allocations if NVTE_USE_ATOMIC_AMAX is set
matthiasdiener 50b34aa
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener ef532b1
remove use_block_amax parameter, more cleanups
matthiasdiener f933ef3
Factor workspace allocation into function
matthiasdiener 7d4054e
expand test slightly
matthiasdiener 63cff98
Revert "expand test slightly"
c7d44a7
guard by HIP macro, address review comments
matthiasdiener f92b926
bugfix workspace.data.dptr
matthiasdiener eba552e
various cleanups
matthiasdiener 0d6a177
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener 8eda427
simplify types in allocate_amax_workspace
matthiasdiener 6990928
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener 9ee618f
fix indentation
matthiasdiener 77b1bc3
Merge branch 'dev' into speedup-amax-kernel
matthiasdiener 1357d4b
Use private implementation of DIVUP
matthiasdiener 01b61b5
define amax_kernel_threads on non-AMD
matthiasdiener ed16f8f
Revert "Use private implementation of DIVUP"
matthiasdiener 95dcbdf
Factor out workspace size calculation
matthiasdiener b07edf6
change name
matthiasdiener 233eb0a
add copyright
matthiasdiener File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.