Skip to content

[hipBLASLt] Fix shard overlay convergence#5354

Open
davidd-amd wants to merge 3 commits intodevelopfrom
users/davidd-amd/hipblaslt-convergence
Open

[hipBLASLt] Fix shard overlay convergence#5354
davidd-amd wants to merge 3 commits intodevelopfrom
users/davidd-amd/hipblaslt-convergence

Conversation

@davidd-amd
Copy link
Contributor

@davidd-amd davidd-amd commented Mar 11, 2026

In TheRock's multi-shard build, each shard builds hipBLASLt for a subset of GPU targets, and all shard install trees are overlaid onto a single filesystem prefix. Three artifacts were last-writer-wins during overlay, causing metadata loss for earlier shards:

  • hipblasltExtOpLibrary.dat (ExtOp op/kernel metadata)
  • TensileLiteLibrary_lazy_Mapping (Tensile solution index)
  • hipblasltTransform.hsaco (matrix-transform fat binary)

The fix introduces HIPBLASLT_DIST_TARGETS, a new CMake cache variable representing the full distribution GPU target list. When set by TheRock (e.g. -DHIPBLASLT_DIST_TARGETS="gfx942;gfx1100"), all shards use the same target set for metadata generation, producing byte-for-byte identical artifacts across shards so overlay is safe.

HIPBLASLT_DIST_TARGETS defaults to GPU_TARGETS, so standalone builds are unaffected. The host library (libhipblaslt.so) requires no change — GPU_TARGETS has no effect on host compilation flags and the library is already bit-for-bit identical across shards.

Changes:

  • Add HIPBLASLT_DIST_TARGETS cache variable to top-level CMakeLists.txt
  • Use HIPBLASLT_DIST_TARGETS for TensileCreateLibrary --architecture arg
  • Use HIPBLASLT_DIST_TARGETS for ExtOp arch loop in extops/CMakeLists.txt
  • Use HIPBLASLT_DIST_TARGETS for matrix-transform --offload-arch flags

In TheRock's multi-shard build, each shard builds hipBLASLt for a subset
of GPU targets, and all shard install trees are overlaid onto a single
filesystem prefix. Three artifacts were last-writer-wins during overlay,
causing metadata loss for earlier shards:

  - hipblasltExtOpLibrary.dat (ExtOp op/kernel metadata)
  - TensileLiteLibrary_lazy_Mapping (Tensile solution index)
  - hipblasltTransform.hsaco (matrix-transform fat binary)

The fix introduces HIPBLASLT_DIST_TARGETS, a new CMake cache variable
representing the full distribution GPU target list. When set by TheRock
(e.g. -DHIPBLASLT_DIST_TARGETS="gfx942;gfx1100"), all shards use the
same target set for metadata generation, producing byte-for-byte identical
artifacts across shards so overlay is safe.

HIPBLASLT_DIST_TARGETS defaults to GPU_TARGETS, so standalone builds
are unaffected. The host library (libhipblaslt.so) requires no change —
GPU_TARGETS has no effect on host compilation flags and the library is
already bit-for-bit identical across shards.

Changes:
- Add HIPBLASLT_DIST_TARGETS cache variable to top-level CMakeLists.txt
- Use HIPBLASLT_DIST_TARGETS for TensileCreateLibrary --architecture arg
- Use HIPBLASLT_DIST_TARGETS for ExtOp arch loop in extops/CMakeLists.txt
- Use HIPBLASLT_DIST_TARGETS for matrix-transform --offload-arch flags

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.23%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5354      +/-   ##
===========================================
+ Coverage    66.51%   66.51%   +0.01%     
===========================================
  Files         1791     1791              
  Lines       277130   276814     -316     
  Branches     38793    38707      -86     
===========================================
- Hits        184310   184120     -190     
+ Misses       76836    76730     -106     
+ Partials     15984    15964      -20     
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (ø) Carriedforward from 5fae749
hipBLASLt 43.55% <ø> (-0.44%) ⬇️
hipCUB 82.38% <ø> (ø) Carriedforward from 5fae749
hipDNN 83.99% <ø> (ø) Carriedforward from 5fae749
hipFFT 58.53% <ø> (ø) Carriedforward from 5fae749
hipRAND 76.12% <ø> (ø) Carriedforward from 5fae749
hipSOLVER 68.81% <ø> (ø) Carriedforward from 5fae749
hipSPARSE 84.70% <ø> (ø) Carriedforward from 5fae749
rocBLAS 47.95% <ø> (ø) Carriedforward from 5fae749
rocFFT 50.19% <ø> (ø) Carriedforward from 5fae749
rocRAND 57.08% <ø> (ø) Carriedforward from 5fae749
rocSOLVER 77.23% <ø> (ø) Carriedforward from 5fae749
rocSPARSE 71.53% <ø> (ø) Carriedforward from 5fae749

*This pull request uses carry forward flags. Click here to find out more.
see 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@math-ci-webhook
Copy link

perfci run on commit 5fae749

math-ci run

@davidd-amd davidd-amd marked this pull request as ready for review March 11, 2026 22:03
@davidd-amd davidd-amd requested a review from a team as a code owner March 11, 2026 22:03
@davidd-amd davidd-amd requested a review from aliiqbal24 March 11, 2026 22:04
@davidd-amd
Copy link
Contributor Author

FYI @stellaraccident

@davidd-amd davidd-amd requested a review from bstefanuk March 11, 2026 22:05
@math-ci-webhook
Copy link

perfci run on commit 927de9d

math-ci run

@davidd-amd davidd-amd requested a review from marbre March 18, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants