Skip to content

Commit 8bbfd21

Browse files
authored
[PROTON] Implement max bps method for XPU (#5617)
This PR implements theoretical memory bandwidth calculation for XPU GPUs in proton. Remarks: - Derived formula was computed and compared against the published xpu bandwidths. - The multipliers in `arch_to_mem_type_multiplier` are related to memory types that the architectures implement (gddr6, hbm2e) - Only 3 arch mappings are included. To my knowledge the rest of intel gpus are integrated, thus bandwidth is system dependent. Perhaps some exception catching should be implemented, however that is not the case for other branches. - The result is in mega bytes, aligned with cuda, but the docstring as well as hip case point to the fact that the returned value should be in bytes. I think this should be aligned upstream.
1 parent 98da880 commit 8bbfd21

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

third_party/proton/proton/specs.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@
1818
'gfx950': 8.0 * 1e12,
1919
}
2020

21+
xpu_arch_to_mem_type_multiplier = {
22+
"pvc": 2,
23+
"dg2": 8,
24+
"bmg": 8,
25+
}
26+
2127
# FP8 Matrix Performance(FLOPS/clock/CU)
2228
# For gfx90a we use the performance of INT8 since it doesn't support FP8 matrix operations.
2329
amd_fp8_flops_by_arch = {'gfx90a': 1024, 'gfx942': 4096, 'gfx950': 8192}
@@ -68,6 +74,4 @@ def max_bps(device_type, arch, bus_width, memory_clock_rate):
6874
return amd_bps_by_arch[arch]
6975
else:
7076
assert device_type == "XPU"
71-
# FIXME: how to get correctly numbers on XPU?
72-
# https://github.com/intel/intel-xpu-backend-for-triton/issues/5550
73-
return 2 * bus_width * memory_clock_rate * 1e3 / 8
77+
return xpu_arch_to_mem_type_multiplier[arch] * bus_width * memory_clock_rate * 1e3 / 8

0 commit comments

Comments
 (0)