Skip to content

Comments

benchdnn: respect accumulation mode in safe digits calculation#4723

Open
echeresh wants to merge 1 commit intomainfrom
echeresh/fix-f16
Open

benchdnn: respect accumulation mode in safe digits calculation#4723
echeresh wants to merge 1 commit intomainfrom
echeresh/fix-f16

Conversation

@echeresh
Copy link
Contributor

Jira: MFDNN-14670

PR addresses two issues:

  1. benchdnn doesn't respect acc-mode during data filling (e.g. see the matmul failure below)
  2. GPU convolution/deconvolution is not aligned with documentation and can use f16 accumulator even when acc-mode doesn't allow it: link
    • It's a FIXME but at this point it's easier to wait until XeLP/XeLPG are officially deprecated

Matmul failures on XeLP/XeLPG - note --attr-acc-mode=any:

$ ./build/tests/benchdnn/benchdnn -v5 --matmul --engine=gpu --dt=f16:f16:f32 --attr-acc-mode=any -v5 128x10240:10240x128
create: --matmul --engine=gpu --dt=f16:f16:f32 --attr-acc-mode=any 128x10240:10240x128
oneDNN implementation: jit:gemm:any
CPU reference oneDNN implementation: brg_matmul:avx2
run: --matmul --engine=gpu --dt=f16:f16:f32 --attr-acc-mode=any 128x10240:10240x128
run ref: --matmul --engine=cpu --attr-acc-mode=any 128x10240:10240x128
[   0][DST][0:0] exp_f32:       -2613 exp:       -2613 got:       -2612 diff:       1 rdiff:0.000382702
[   4][DST][0:4] exp_f32:        2365 exp:        2365 got:        2364 diff:       1 rdiff:0.000422833
[  10][DST][0:10] exp_f32:       -2169 exp:       -2169 got:       -2168 diff:       1 rdiff:0.000461042
[  16][DST][0:16] exp_f32:       -3379 exp:       -3379 got:       -3380 diff:       1 rdiff:0.000295946
[  25][DST][0:25] exp_f32:        2821 exp:        2821 got:        2822 diff:       1 rdiff:0.000354484
[  29][DST][0:29] exp_f32:        2707 exp:        2707 got:        2708 diff:       1 rdiff:0.000369413
[  33][DST][0:33] exp_f32:        2263 exp:        2263 got:        2264 diff:       1 rdiff:0.000441891
[  38][DST][0:38] exp_f32:       -5186 exp:       -5186 got:       -5184 diff:       2 rdiff:0.000385654
[  44][DST][0:44] exp_f32:       -2287 exp:       -2287 got:       -2288 diff:       1 rdiff:0.000437254
[  54][DST][0:54] exp_f32:       -2293 exp:       -2293 got:       -2292 diff:       1 rdiff:0.00043611
[COMPARE_STATS][DST]: trh=0 err_max_diff:       6 err_max_rdiff:0.00101868 all_max_diff:       6 all_max_rdiff:0.00101868
[PRIM_REF][INFO]: L2_size:524288 bytes; per_core_L3_size:2359296 bytes; nthr:24; impl_name:brg_matmul:avx2
0:FAILED (errors:3689 total:16384) (627 ms) __REPRO: --matmul --engine=gpu --dt=f16:f16:f32 --attr-acc-mode=any 128x10240:10240x128

@echeresh echeresh requested a review from a team as a code owner February 24, 2026 19:03
@echeresh echeresh added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Feb 24, 2026
@github-actions github-actions bot added component:tests Codeowner: @oneapi-src/onednn-arch and removed platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel labels Feb 24, 2026
@rjoursler
Copy link
Contributor

GPU convolution/deconvolution is not aligned with documentation and can use f16 accumulator even when acc-mode doesn't allow it: link
It's a FIXME but at this point it's easier to wait until XeLP/XeLPG are officially deprecated

I think we should just fix this (unless it causes functionality regressions). It looks like is was just missed when implementing #1540.

@echeresh
Copy link
Contributor Author

echeresh commented Feb 24, 2026

GPU convolution/deconvolution is not aligned with documentation and can use f16 accumulator even when acc-mode doesn't allow it: link
It's a FIXME but at this point it's easier to wait until XeLP/XeLPG are officially deprecated

I think we should just fix this (unless it causes functionality regressions). It looks like is was just missed when implementing #1540.

I agree, it's tracked in MFDNN-13271
I currently lack the bandwidth for it so that's why I came up with this more targeted fix in benchdnn to fix the existing Nightly failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tests Codeowner: @oneapi-src/onednn-arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants