Skip to content

[hipDNN] Improve sample failure output with tensor diffs#5591

Open
Bingtagui404 wants to merge 1 commit intoROCm:developfrom
Bingtagui404:users/Bingtagui404/tensor-diff-utility
Open

[hipDNN] Improve sample failure output with tensor diffs#5591
Bingtagui404 wants to merge 1 commit intoROCm:developfrom
Bingtagui404:users/Bingtagui404/tensor-diff-utility

Conversation

@Bingtagui404
Copy link

Description

When a CPU validation failure occurs in the hipDNN samples, the output only tells you which tensor failed — not how it failed:

CPU reference validation:
  y: failed

This PR adds a reusable tensor diff utility and wires it into all sample validation paths so failures automatically print detailed diagnostics:

CPU reference validation:
  y: failed
  Tensor diff for "y":
    Total elements: 65536
    Mismatched:     42 (0.06%)
    Max abs diff:   1.234567e-03 at [0, 2, 14, 7]
    Mean abs diff:  3.456789e-04
    Worst mismatches:
      [0, 2, 14, 7]: ref=0.543210, impl=0.544444, diff=1.234567e-03
      ...

Changes

  • New: test_sdk/.../utilities/TensorDiff.hpp — header-only utility providing:
    • computeTensorDiff<T>() — element-wise comparison with summary statistics
    • printTensorDiffSummary() — formatted output
    • validateAndReport<T>() — drop-in replacement combining allClose() + status print + diff on failure
  • Modified: All 14 sample .cpp files to use validateAndReport<T>() instead of manual allClose() + cout pattern

Safety

  • Shape/element-count mismatches are detected before element-wise comparison to prevent out-of-bounds access
  • computeTensorDiff runs single-threaded to avoid data races in summary accumulation
  • maxMismatches == 0 is handled as "summary only" mode

Fixes #5547

Add a reusable TensorDiff utility to test_sdk that computes
element-wise tensor comparisons and prints summary statistics
(mismatch count, max/mean absolute error, worst mismatches)
when CPU validation fails.

Wire it into all 14 sample validation paths via a
validateAndReport<T>() helper so failures automatically
print the diff instead of just "failed".

Shape mismatches are detected separately and reported without
attempting element-wise comparison to avoid out-of-bounds access.

Fixes ROCm#5547
@Bingtagui404 Bingtagui404 requested a review from a team as a code owner March 19, 2026 00:37
@assistant-librarian assistant-librarian bot added the external contribution Code contribution from users community.. label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community.. project: hipdnn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[hipDNN] Improve sample failure output with tensor diffs

1 participant