Skip to content

[HLSL] Miscompilation of ReduceMultiDims.* DML shaders directly cause 589 operator test failures #162168

@Icohedron

Description

@Icohedron

589 DML operator test failures consistent across devices with AMD, NVIDIA, and Intel GPUs are confirmed to be caused by the following shaders:

ReduceMultiDimsL2_float16_native_accum32_8_Default
ReduceMultiDimsLogSumExp_float16_native_accum32_8_Default
ReduceMultiDimsL1_float16_native_accum32_8_Default
ReduceMultiDimsLogSum_float16_native_accum32_8_Default
ReduceMultiDimsAverage_float16_native_accum32_8_Default
ReduceMultiDimsSumSquare_float16_native_accum32_8_Default
ReduceMultiDimsMax_float16_native_accum32_8_Default
ReduceMultiDimsSum_float16_native_accum32_8_Default
ReduceMultiDimsMultiply_float16_native_accum32_8_Default
ReduceMultiDimsMin_float16_native_accum32_8_Default
ReduceMultiDimsMin_uint16_native_8_Default
ReduceMultiDimsSumSquare_int64_native_8_Default
ReduceMultiDimsMax_uint16_native_8_Default
ReduceMultiDimsMultiply_int64_native_8_Default
ReduceMultiDimsMin_int16_native_8_Default
ReduceMultiDimsSumSquare_uint64_native_8_Default
ReduceMultiDimsMax_int16_native_8_Default
ReduceMultiDimsMultiply_uint64_native_8_Default

One failed test per shader:

OperatorTests::GlobalPooling#134
OperatorTests::ReduceDefault#metadataSet4#22
OperatorTests::ReduceDefault#metadataSet1#22
OperatorTests::ReduceDefault#metadataSet3#22
OperatorTests::GlobalPooling#17
OperatorTests::ReduceDefault#metadataSet9#22
OperatorTests::GlobalPooling#33
OperatorTests::ReduceDefault#metadataSet8#22
OperatorTests::ReduceDefault#metadataSet7#22
OperatorTests::ReduceDefault#metadataSet6#22
OperatorTests::ReduceMinMaxMultiDims#metadataSet1#27
OperatorTests::ReduceMultiplySumL1SumSquareMultiDims#metadataSet3#9
OperatorTests::ReduceMinMaxMultiDims#metadataSet0#27
OperatorTests::ReduceMultiplySumL1SumSquareMultiDims#metadataSet0#9
OperatorTests::ReduceMinMaxMultiDims#metadataSet1#47
OperatorTests::ReduceMultiplySumL1SumSquareMultiDims#metadataSet3#44
OperatorTests::ReduceMinMaxMultiDims#metadataSet0#47
OperatorTests::ReduceMultiplySumL1SumSquareMultiDims#metadataSet0#100

Sample Reproduction

> ./TE.exe DirectML.Test.OperatorTests.dll /name:"OperatorTests::GlobalPooling#134" /p:DisableMetacommands=1 /logOutput:low
Test Authoring and Execution Framework v10.72 for x64

StartGroup: OperatorTests::GlobalPooling#134
Error: Output Tensor #0:
Error: Tensor Sizes: 3,1,1,1,1
Error: Tensor Data Type: float16
Error: Index: 0001 @00000001 [1,0,0,0,0].  Ref: 4.8828125000 (0x44E2).  DML: nan (0x7FFF).  Abs: nan.  Rel: nan%.  Ulp: 15133
Error: Index: 0002 @00000002 [2,0,0,0,0].  Ref: 4.9882812500 (0x44FD).  DML: nan (0x7FFF).  Abs: nan.  Rel: nan%.  Ulp: 15106
Error: 2 / 3 (66.666667%) of elements were found to be above tolerance.
Error: Max ULP delta: 15133.  Allowed tolerance: 1 ULPs (float16).
Error: Verify: Fail [File: C:\workspace\DirectML\SharedToolingLib\External\Test\TaefHelper\TaefHelper.cpp, Function: TaefHelper::Fail, Line: 133]
EndGroup: OperatorTests::GlobalPooling#134 [Failed]

Summary of Non-passing Tests:
    OperatorTests::GlobalPooling#134 [Failed]

Summary: Total=1, Passed=0, Failed=1, Blocked=0, Not Run=0, Skipped=0

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions