Skip to content

[hipBLASlt]: Add integer_exact initialization option for matrix data#5535

Open
tony-davis wants to merge 9 commits intodevelopfrom
users/todavis/hipblaslt-exact-int-tests
Open

[hipBLASlt]: Add integer_exact initialization option for matrix data#5535
tony-davis wants to merge 9 commits intodevelopfrom
users/todavis/hipblaslt-exact-int-tests

Conversation

@tony-davis
Copy link
Contributor

@tony-davis tony-davis commented Mar 17, 2026

Summary of changes

  • Introduced a new initialization option integer_exact for matrix data, allowing for specific integer initialization patterns.
  • Updated hipblaslt_initialization enum to include integer_exact with a corresponding value.
  • Enhanced the unit_print_first_mismatch function to support diagnostics for integer_exact initialization.
  • Modified the hipblaslt_init_device function to handle the new initialization type, ensuring proper data generation.
  • Added multiple test cases in YAML files to validate the behavior of the integer_exact initialization across various matrix sizes and configurations.

No functional or build impact outside of the new feature implementation.

Motivation

Implements integer-exact GEMM testing for hipBLASlt: input data are chosen so dot products stay in a small integer range, enabling GPU vs CPU comparison with exact match (zero tolerance) instead of a numeric tolerance. That improves defect detection and avoids tolerance masking issues, especially with 16-bit types. The pattern used is A and C in {0, 1, 2}, B in ±{0, 1, 2} with a checkerboard sign pattern, so results remain exactly representable under defined K limits.

Technical Details

  • Initialization: New hipblaslt_initialization::integer_exact. In hipblaslt_init_device, A and C use small_int_positive (values 0, 1, 2); B uses the same values with sign chosen by (i^j)&1 so the first element of each row and column alternates (checkerboard).
  • Verification: When initialization == integer_exact, testing_matmul sets tolerance to 0 (exact check) and calls unit_print_first_mismatch for D before the check to improve failure diagnostics.
  • 16-bit K limit: For integer_exact with 16-bit (fp16/bf16), tests are skipped when any K > 512 so dot products stay exactly representable; 32-bit and 64-bit types use the full size ranges.
  • B matrix sync: Device→host sync for B uses storage dimensions (K, N, ldb) instead of logical (B_row, B_col, ldb) so transB and swizzle paths do not hit invalid leading dimension or wrong copy.
  • Tests: Eight new matmul_integer_exact_* tests in matmul_gtest.yaml mirror the existing matmul suite (one, small, medium, batch_medium, chunk, grid_limit_real, grid_limit_double, deepbench), with the same size ranges and filters (including gpu_arch where applicable).

Test Plan

  • Build hipBLASlt and run the new integer_exact matmul tests (quick, pre_checkin, and nightly as defined in YAML).
  • Confirm 16-bit cases with K > 512 are skipped (message printed) and that 32-bit and smaller-K 16-bit cases run and pass with exact match.

Related Tickets

ROCM-19939 motivated the creation of these tests.

Submission Checklist

## Summary of changes

- Introduced a new initialization option `integer_exact` for matrix data, allowing for specific integer initialization patterns.
- Updated `hipblaslt_initialization` enum to include `integer_exact` with a corresponding value.
- Enhanced the `unit_print_first_mismatch` function to support diagnostics for `integer_exact` initialization.
- Modified the `hipblaslt_init_device` function to handle the new initialization type, ensuring proper data generation.
- Added multiple test cases in YAML files to validate the behavior of the `integer_exact` initialization across various matrix sizes and configurations.

No functional or build impact outside of the new feature implementation.
@tony-davis tony-davis marked this pull request as ready for review March 17, 2026 21:01
@tony-davis tony-davis requested a review from a team as a code owner March 17, 2026 21:01
Copilot AI review requested due to automatic review settings March 17, 2026 21:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new integer_exact matrix initialization mode intended to enable exact (zero-tolerance) GEMM result comparisons by constraining generated operands to small integers with a checkerboard sign pattern.

Changes:

  • Extends hipblaslt_initialization (YAML + C++) to include integer_exact and exposes it via the bench CLI.
  • Implements device-side data generation for integer_exact and adds a host diagnostic helper (unit_print_first_mismatch) for exact-check failures.
  • Adds a new integer-exact matmul test suite in YAML, plus runtime skipping for 16-bit types when K > 512.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
projects/hipblaslt/clients/tests/data/matmul_gtest.yaml Adds new matmul_integer_exact_* YAML test entries mirroring existing matmul suites.
projects/hipblaslt/clients/tests/data/hipblaslt_common.yaml Registers integer_exact as an initialization enum value for test YAML.
projects/hipblaslt/clients/common/src/hipblaslt_init_device.cpp Implements integer_exact data generation (small ints + checkerboard sign for B).
projects/hipblaslt/clients/common/include/unit.hpp Adds unit_print_first_mismatch helper for improved mismatch diagnostics.
projects/hipblaslt/clients/common/include/testing_matmul.hpp Hooks diagnostics + adds 16-bit K skip and adjusts B host sync dims; sets tolerance to 0 for integer_exact in bias paths.
projects/hipblaslt/clients/common/include/hipblaslt_init.hpp Updates comment describing alternating-sign initialization.
projects/hipblaslt/clients/common/include/hipblaslt_datatype2string.hpp Extends init enum and string conversions to include integer_exact.
projects/hipblaslt/clients/bench/src/client.cpp Updates CLI help text to list integer_exact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tony-davis and others added 3 commits March 17, 2026 16:35
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…_general

The "Apply suggestions from code review" commit removed the call to
unit_print_first_mismatch when initialization == integer_exact. Restore
it so integer_exact test failures still get the first-mismatch diagnostic
(i, j, batch, CPU/GPU values, count) before the generic unit_check_general.

Made-with: Cursor
## Summary of changes

- Introduced a new precision configuration for TF32x1, allowing for f32 A+B operations with bf16 compute input.
- Updated YAML test files to include a new test case for integer-exact initialization using TF32x1 on gfx950 architecture.
- Enhanced documentation within the YAML files to clarify the purpose and constraints of the new TF32x1 configuration.

No functional or build impact outside of the new test case implementation.
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.21%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #5535      +/-   ##
===========================================
- Coverage    67.27%   66.58%   -0.68%     
===========================================
  Files         1842     1839       -3     
  Lines       283628   282922     -706     
  Branches     39803    39696     -107     
===========================================
- Hits        190791   188383    -2408     
- Misses       76380    78193    +1813     
+ Partials     16457    16346     -111     
Flag Coverage Δ *Carryforward flag
hipBLAS 90.67% <ø> (ø) Carriedforward from f186f40
hipBLASLt 43.57% <ø> (+0.03%) ⬆️
hipCUB 82.38% <ø> (+0.18%) ⬆️ Carriedforward from f186f40
hipDNN 85.11% <ø> (-0.21%) ⬇️ Carriedforward from f186f40
hipFFT 56.36% <ø> (+0.77%) ⬆️ Carriedforward from f186f40
hipRAND 76.12% <ø> (ø) Carriedforward from f186f40
hipSOLVER 68.81% <ø> (ø) Carriedforward from f186f40
hipSPARSE 84.70% <ø> (ø) Carriedforward from f186f40
rocBLAS 47.97% <ø> (ø) Carriedforward from f186f40
rocFFT 47.38% <ø> (-5.86%) ⬇️ Carriedforward from f186f40
rocRAND 57.07% <ø> (ø) Carriedforward from f186f40
rocSOLVER 77.21% <ø> (ø) Carriedforward from f186f40
rocSPARSE 71.48% <ø> (ø) Carriedforward from f186f40

*This pull request uses carry forward flags. Click here to find out more.
see 92 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…32_fast_bf16

Restrict matmul_integer_exact_tf32x1_gfx950 to 128/131/1024/1031 cubes
instead of one_matrix_size_range to avoid NO solution and lda sync
failures on edge cases (e.g. 1x1x1 TT).

Made-with: Cursor
TF32x1 (f32_bf16_r) returns no solution for both-transposed on gfx950
in CI; restrict matmul_integer_exact_tf32x1_gfx950 to NN, NT, TN.

Made-with: Cursor
GPU vs CPU exact match fails on gfx11 while passing on other families;
skip at runtime with concise log. Note in matmul_gtest.yaml.

Made-with: Cursor
…agnitudes

Added a constant offset to the PRNG index for B in the fill_batch function to differentiate the magnitudes from A, preventing potential correlation issues. This change enhances the randomness of the generated values during device initialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants