-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededirisIris project issueIris project issue
Description
Problem Description
GEMM AllScatter fails on recent ROCm + Triton
Arguments:
-m 16384 -n 16384 -k 16384 --BLK_M 128 --BLK_N 128 --BLK_K 64 --gsize_m 6 --gemm_sms 256
Operating System
Any
CPU
Any
GPU
MI350
ROCm Version
ROCm 7.0
ROCm Component
No response
Steps to Reproduce
ROCm 7.0
Triton aafec41
2025-10-15T22:40:54.3295140Z [Iris] [0/8] Validating...
2025-10-15T22:40:54.3298046Z [Iris] [6/8] Validating...
2025-10-15T22:40:54.3298501Z [Iris] [7/8] Validating...
2025-10-15T22:40:54.3298860Z [Iris] [5/8] Validating...
2025-10-15T22:40:54.3299179Z [Iris] [4/8] Validating...
2025-10-15T22:40:54.3299466Z [Iris] [3/8] Validating...
2025-10-15T22:40:54.3299733Z [Iris] [1/8] Validating...
2025-10-15T22:40:54.3300002Z [Iris] [2/8] Validating...
2025-10-15T22:40:54.6391187Z [Iris] [3/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6392997Z [Iris] [0/8] Max absolute difference: 762.0
2025-10-15T22:40:54.6398779Z [Iris] [6/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6410568Z [Iris] [2/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6410882Z [Iris] [5/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6413163Z [Iris] [4/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6451685Z [Iris] [7/8] Max absolute difference: 762.5
2025-10-15T22:40:54.6455951Z [Iris] [1/8] Max absolute difference: 762.5
2025-10-15T22:43:26.8507604Z [Iris] [4/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:27.8096835Z [Iris] [1/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:28.2538152Z [Iris] [2/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:28.5311227Z [Iris] [7/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:28.8918646Z [Iris] [6/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:29.1256990Z [Iris] [3/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:30.5465158Z [Iris] [0/8] Mismatch at index (0, 2048): C=0.0, expected=127.625
2025-10-15T22:43:31.8870828Z [Iris] [5/8] Mismatch at index (0, 0): C=0.0, expected=126.375
2025-10-15T22:43:48.3195380Z [Iris] [4/8] Final C validation failed.
2025-10-15T22:43:49.3836251Z [Iris] [2/8] Final C validation failed.
2025-10-15T22:43:49.9968054Z [Iris] [1/8] Final C validation failed.
2025-10-15T22:43:50.6919686Z [Iris] [6/8] Final C validation failed.
2025-10-15T22:43:51.2016482Z [Iris] [7/8] Final C validation failed.
2025-10-15T22:43:51.6813982Z [Iris] [3/8] Final C validation failed.
2025-10-15T22:43:52.3064221Z [Iris] [0/8] Final C validation failed.
2025-10-15T22:43:53.7377063Z [Iris] [5/8] Final C validation failed.
2025-10-15T22:43:53.7390271Z [Iris] [5/8] Validating local C...
2025-10-15T22:43:53.7393071Z [Iris] [5/8] Validation completed
2025-10-15T22:43:53.7393672Z [Iris] [5/8] Benchmarking...
2025-10-15T22:43:53.7394201Z [Iris] [0/8] Validating local C...
2025-10-15T22:43:53.7394649Z [Iris] [2/8] Validating local C...
2025-10-15T22:43:53.7395072Z [Iris] [3/8] Validating local C...
2025-10-15T22:43:53.7395498Z [Iris] [7/8] Validating local C...
2025-10-15T22:43:53.7395918Z [Iris] [0/8] Validation completed
2025-10-15T22:43:53.7396344Z [Iris] [2/8] Validation completed
2025-10-15T22:43:53.7396766Z [Iris] [0/8] Benchmarking...
2025-10-15T22:43:53.7397163Z [Iris] [2/8] Benchmarking...
2025-10-15T22:43:53.7397574Z [Iris] [3/8] Validation completed
2025-10-15T22:43:53.7397995Z [Iris] [1/8] Validating local C...
2025-10-15T22:43:53.7398376Z [Iris] [4/8] Validating local C...
2025-10-15T22:43:53.7398657Z [Iris] [7/8] Validation completed
2025-10-15T22:43:53.7398926Z [Iris] [3/8] Benchmarking...
2025-10-15T22:43:53.7399178Z [Iris] [7/8] Benchmarking...
2025-10-15T22:43:53.7399434Z [Iris] [1/8] Validation completed
2025-10-15T22:43:53.7399695Z [Iris] [1/8] Benchmarking...
2025-10-15T22:43:53.7399951Z [Iris] [4/8] Validation completed
2025-10-15T22:43:53.7400951Z [Iris] [4/8] Benchmarking...
2025-10-15T22:43:53.7401286Z [Iris] [6/8] Validating local C...
2025-10-15T22:43:53.7401634Z [Iris] [6/8] Validation completed
2025-10-15T22:43:53.7401975Z [Iris] [6/8] Benchmarking...
2025-10-15T22:43:54.0091824Z [Iris] [5/8] tile matmul + all_scatter (total_tiles=2048): 1.891 ms 4652.487 tflops
2025-10-15T22:43:54.0092456Z [Iris] [3/8] tile matmul + all_scatter (total_tiles=2048): 1.893 ms 4645.596 tflops
2025-10-15T22:43:54.0093282Z [Iris] [7/8] tile matmul + all_scatter (total_tiles=2048): 1.895 ms 4642.675 tflops
2025-10-15T22:43:54.0093801Z [Iris] [0/8] tile matmul + all_scatter (total_tiles=2048): 1.905 ms 4617.322 tflops
2025-10-15T22:43:54.0094312Z [Iris] [2/8] tile matmul + all_scatter (total_tiles=2048): 1.893 ms 4646.822 tflops
2025-10-15T22:43:54.0094819Z [Iris] [1/8] tile matmul + all_scatter (total_tiles=2048): 1.898 ms 4634.928 tflops
2025-10-15T22:43:54.0095336Z [Iris] [4/8] tile matmul + all_scatter (total_tiles=2048): 1.895 ms 4642.131 tflops
2025-10-15T22:43:54.0095857Z [Iris] [6/8] tile matmul + all_scatter (total_tiles=2048): 1.901 ms 4626.729 tflops
2025-10-15T22:43:54.7331038Z {
2025-10-15T22:43:54.7331336Z "world_size": 8,
2025-10-15T22:43:54.7331768Z "m": 16384,
2025-10-15T22:43:54.7332134Z "n": 2048,
2025-10-15T22:43:54.7332517Z "k": 16384,
2025-10-15T22:43:54.7332954Z "debug": false,
2025-10-15T22:43:54.7333310Z "validate": true,
2025-10-15T22:43:54.7333758Z "trace_tiles": false,
2025-10-15T22:43:54.7334160Z "benchmark": true,
2025-10-15T22:43:54.7334470Z "datatype": "fp16",
2025-10-15T22:43:54.7334835Z "output_file": "perf_result.json",
2025-10-15T22:43:54.7335150Z "BLK_M": 128,
2025-10-15T22:43:54.7335400Z "BLK_N": 128,
2025-10-15T22:43:54.7335609Z "BLK_K": 64,
2025-10-15T22:43:54.7335840Z "gsize_m": 6,
2025-10-15T22:43:54.7336100Z "heap_size": 8589934592,
2025-10-15T22:43:54.7336471Z "gemm_sms": 256,
2025-10-15T22:43:54.7336701Z "num_sms": 256,
2025-10-15T22:43:54.7336971Z "num_ranks": 8,
2025-10-15T22:43:54.7337195Z "M": 16384,
2025-10-15T22:43:54.7337402Z "N": 16384,
2025-10-15T22:43:54.7337603Z "K": 16384,
2025-10-15T22:43:54.7337815Z "success": false,
2025-10-15T22:43:54.7338106Z "gemm_registers": null,
2025-10-15T22:43:54.7338380Z "gemm_spills": null,
2025-10-15T22:43:54.7338637Z "tflops": 4617.321863777792,
2025-10-15T22:43:54.7338910Z "total_ms": 1.905020546913147,
2025-10-15T22:43:54.7339196Z "gemm_ms": 1.6720486131925432,
2025-10-15T22:43:54.7339472Z "gemm_experiments": 126
2025-10-15T22:43:54.7339720Z }
2025-10-15T22:44:03.9383714Z Validating performance results...
2025-10-15T22:44:03.9495371Z [ERROR] Benchmark failed (success: false)
2025-10-15T22:44:03.9584127Z {
2025-10-15T22:44:03.9584551Z "world_size": 8,
2025-10-15T22:44:03.9584831Z "m": 16384,
2025-10-15T22:44:03.9585017Z "n": 2048,
2025-10-15T22:44:03.9585193Z "k": 16384,
2025-10-15T22:44:03.9585368Z "debug": false,
2025-10-15T22:44:03.9585609Z "validate": true,
2025-10-15T22:44:03.9585808Z "trace_tiles": false,
2025-10-15T22:44:03.9586019Z "benchmark": true,
2025-10-15T22:44:03.9586225Z "datatype": "fp16",
2025-10-15T22:44:03.9586466Z "output_file": "perf_result.json",
2025-10-15T22:44:03.9586719Z "BLK_M": 128,
2025-10-15T22:44:03.9586903Z "BLK_N": 128,
2025-10-15T22:44:03.9587082Z "BLK_K": 64,
2025-10-15T22:44:03.9587258Z "gsize_m": 6,
2025-10-15T22:44:03.9587458Z "heap_size": 8589934592,
2025-10-15T22:44:03.9587687Z "gemm_sms": 256,
2025-10-15T22:44:03.9587875Z "num_sms": 256,
2025-10-15T22:44:03.9588061Z "num_ranks": 8,
2025-10-15T22:44:03.9588245Z "M": 16384,
2025-10-15T22:44:03.9588414Z "N": 16384,
2025-10-15T22:44:03.9588581Z "K": 16384,
2025-10-15T22:44:03.9588755Z "success": false,
2025-10-15T22:44:03.9589289Z "gemm_registers": null,
2025-10-15T22:44:03.9589557Z "gemm_spills": null,
2025-10-15T22:44:03.9589782Z "tflops": 4617.321863777792,
2025-10-15T22:44:03.9589951Z "total_ms": 1.905020546913147,
2025-10-15T22:44:03.9590459Z "gemm_ms": 1.6720486131925432,
2025-10-15T22:44:03.9590677Z "gemm_experiments": 126
2025-10-15T22:44:03.9590871Z }
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is neededirisIris project issueIris project issue