Skip to content

Commit 7c0f701

Browse files
committed
[Kernels][GPU]: fix memset on test_matmul()
The test infrastructure has a critical bug where the output buffer is never zeroed before kernel execution. In run_test(), line 132 incorrectly zeros c_device_buffer_ref (the reference buffer) instead of c_device_buffer (the output buffer that kernels write to). The matmul kernels use an accumulation pattern: var dst_reg: c.element_type = 0 dst_reg += a * b // accumulate in register dst[row, col] += dst_reg // accumulate into output buffer This requires the destination buffer to be zero-initialized. Without zeroing, the kernel accumulates into garbage or previous test results. Since the test instance reuses the same output buffer across multiple kernel tests (k1 through k6), each test accumulates on top of previous results: - First test: c_device_buffer = garbage + result1 → WRONG - Second test: c_device_buffer = (garbage + result1) + result2 → MORE WRONG - Subsequent tests continue accumulating corrupted results The bug existed from the original commit but likely went unnoticed because: 1. Modern GPU drivers (CUDA/ROCm) often zero-initialize allocated memory as a security feature to prevent data leakage between processes 2. GPU memory allocators may return pre-zeroed blocks from memory pools 3. Platform-specific behavior (NVIDIA vs AMD) may mask the issue differently 4. Test tolerance (rtol=0.01) may hide small accumulation errors This is undefined behavior that happens to "work" on some platforms but can fail randomly depending on driver behavior, hardware, memory allocation patterns, or test execution order. The fix changes line 121 to correctly zero the output buffer before each test: ctx.enqueue_memset(self.c_device_buffer, 0) This ensures: - Output buffer starts at zero before each kernel execution - Accumulation pattern works correctly: 0 + result = result - Tests are isolated and don't interfere with each other - Results are deterministic and reproducible across platforms Fixes: 830ce10 ("[Kernels] Open source the gpu tests (#60860)")
1 parent ca12f0d commit 7c0f701

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

max/kernels/test/gpu/layout/test_matmul.mojo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ struct test_matmul[
113113
print("=== test_matmul")
114114

115115
var ctx = self.ctx
116-
ctx.enqueue_memset(self.c_device_buffer_ref, 0)
116+
ctx.enqueue_memset(self.c_device_buffer, 0)
117117

118118
fn create_tensor[
119119
layout: Layout

0 commit comments

Comments
 (0)