-
Notifications
You must be signed in to change notification settings - Fork 189
codegen: BAddrInterleave #3679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jfactory07
wants to merge
26
commits into
develop
Choose a base branch
from
users/jzhou/address-interleave
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,065
−40
Draft
codegen: BAddrInterleave #3679
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
e77420a
codegen: BAddrInterleave
jfactory07 cec7d49
codegen: implement align-k
jfactory07 5ef3eb7
refine
jfactory07 448c4d6
refine macro
jfactory07 c049b0c
codegen: do NOT overwrite the original stride SGPRs in-place.
jfactory07 9003fda
codeGen : refine default value
jfactory07 8b4554d
codegen : refine get
jfactory07 b3a42a7
host restriction: If n divided by MT1 is not a power of two, address …
jfactory07 63d386a
host restriction: change to :
jfactory07 74c6e7a
codegen: remove BInterleaveG guard from kernel's runtime
jfactory07 e682def
host restriction: add AssertKRingShiftAlignedK
jfactory07 238fdc7
add: AssertKRingShiftTailWrapOnly
jfactory07 10b03bb
codegen: shift = (-baseOffsetElems) mod cacheLineElements
jfactory07 2f6cc19
refine macro
jfactory07 e871991
codegen: refine tail for krs
jfactory07 c1c0aa7
tailStartChunk = ceil(KRingShift / chunkElems)
jfactory07 0e59506
fix error
jfactory07 51f1fe7
clean code
jfactory07 0ecb0db
clean code
jfactory07 60e14e4
clean code
jfactory07 db885c5
refine restriction
jfactory07 b718bda
refine comments
jfactory07 67b5d1a
enable
jfactory07 11dfdd0
add test
jfactory07 b6a3b8f
add test case
jfactory07 be6e173
Merge branch 'develop' into users/jzhou/address-interleave
jfactory07 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -325,6 +325,14 @@ def makeValidMatrixInstructions(): | |
| "DirectToVgprA": [False, True], | ||
| "DirectToVgprB": [False, True], | ||
| "DirectToVgprSparseMetadata": [False, True], | ||
| # B address interleave (restricted): non-contiguous tile columns for TN/NN-like B (TLUB == False), | ||
| # with runtime G chosen as the largest power-of-two factor of (N/MT1), capped by LVCB. | ||
| # Requires SizeJ % MT1 == 0 at runtime; otherwise falls back to original mapping. | ||
| "BAddrInterleave": [False, True], | ||
| # K ring-shift (restricted): apply a per-WG shift along the summation (K) dimension so that | ||
| # the B-side base K address for each workgroup is cacheline-aligned/congruent, while preserving | ||
| # correctness via tail-loop ring wrap. Intended for TN/NN-like B (TLUB == False). | ||
| "KRingShift": [False, True], | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you add the default value of these two new parameters?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, their default values are currently set to false. |
||
| # Attempt to load directly from global memory into LDS. | ||
| # Assembly only | ||
| # Requires BufferLoad, assembler support for lds modifier on buffer | ||
|
|
@@ -434,6 +442,16 @@ def makeValidMatrixInstructions(): | |
| # - See above AssertFree0ElementMultiple "Load optimizations" | ||
| # 1 indicates no assertion (since all sizes are multiples of 1) | ||
| "AssertFree1ElementMultiple": [1, 2, 4, 8, 16], | ||
| # Address-interleave restriction: | ||
| # If >0, require tiles1=(Free1Size / MT1) to have lowbit(tiles1)>1 (i.e. G>1). | ||
| # This matches the kernel's initBInterleaveG logic: | ||
| # - require Free1Size % MT1 == 0 | ||
| # - compute lowbit(tiles1) | ||
| # - enable only if min(lowbit, LVCB) > 1 | ||
| "AssertFree1DivByMT1LowbitGT1": -1, | ||
| # KRingShift wrap restriction (packed integer; see Solution.py for encoding): | ||
| # If >0, require any (k + KRingShift) wrap to occur only in tail loop (no main-loop wrap). | ||
| "AssertKRingShiftTailWrapOnly": -1, | ||
| # Assertions that require arithmetic intensity to be specified value. | ||
| # Arithmetic intensity measures the ratio of computation to memory bandwidth required for a problem. | ||
| # These predicates can be used to adjust solution selection compute-bound or memory-bound problems. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is here mean "falls back to original mapping"?
Will it be rejected by predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, the description here is inaccurate. After adding
AssertFree1DivByMT1LowbitGT1now, shapes that don't satisfy address interleaving will be rejected by predicate. I will correct it. thanks