Replies: 3 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Tuning specs used for convs: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've started looking at the codegen quality across RDNA3 and RDNA4. Both targets are supported to the similar extent: we can target (dense) WMMA instructions and use virtually identical configuration logic at the time of writing.
For the purpose of this comparison, I'm using two workstation cards:
Looking at GEMM/Conv performance w/ bf16 element types:
RDNA3 vs RDNA4 Matmul Performance Comparison
Using the following BOO driver commands:
No tuning
matmul_like_2048x2048x2048_f16xf16xf32matmul_like_4096x4096x4096_f16xf16xf32matmul_like_8192x2048x1024_f16xf16xf32matmul_like_8192x2048x8192_f16xf16xf32matvec_like_8192x1024_f16xf16xf32matvec_like_1280x8192_f16xf16xf32matmul_like_8192x4x1024_f16xf16xf32matmul_like_1280x4x8192_f16xf16xf32RDNA3 vs RDNA4 Convolution Performance Comparison
Using the following BOO driver commands:
No tuning
matmul_like_3x224x2016_bf16xbf16xf32matmul_like_4x224x2016_bf16xbf16xf32matmul_like_4096x576x576_bf16xbf16xf32conv_16x48x32x2048x3x3x768_bf16xbf16xf32(forward)conv_16x48x32x768x3x3x2048_bf16xbf16xf32(backward)conv_16x48x32x576x3x3x576_bf16xbf16xf32(backward)Tuned
RDNA3 vs RDNA4 Convolution Performance Comparison (Tuned)
matmul_like_3x224x2016_bf16xbf16xf32matmul_like_4x224x2016_bf16xbf16xf32matmul_like_4096x576x576_bf16xbf16xf32conv_16x48x32x2048x3x3x768_bf16xbf16xf32(forward)conv_16x48x32x768x3x3x2048_bf16xbf16xf32(backward)conv_16x48x32x576x3x3x576_bf16xbf16xf32(backward)TBC
Beta Was this translation helpful? Give feedback.
All reactions