Skip to content

Commit b0c5b24

Browse files
[release/2.7][SWDEV-544125] update test buffer fudge factor for hipblaslt for test_fully_shard_training_memory test (#2493)
In this PR, I cherry picked upstream commit 78300c8. This fixes the test_fully_shard_training_memory test under /distributed/_composable/fsdp/test_fully_shard_memory.py. It was a failing test in Jira https://ontrack-internal.amd.com/browse/SWDEV-544125 Co-authored-by: Ethan Wee <[email protected]>
1 parent f661647 commit b0c5b24

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

test/distributed/_composable/fsdp/test_fully_shard_memory.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ def _test_fully_shard_training_memory(
117117
# number is kept much smaller than the actual memory usage, which is on
118118
# the order of 100-200+ MB)
119119
buffer_mb = 16
120+
# The default workspace for hipblaslt is larger than for cublas/cublaslt
121+
# which requires a slight increase to this buffer value.
122+
buffer_mb = 16 if torch.version.cuda else 18
120123
if reshard_after_forward:
121124
# 3x max unsharded block parameters (current all-gather + copy-out
122125
# and next all-gather), non-block parameters, and other

0 commit comments

Comments
 (0)