Skip to content

[AMDGPU] Numerical error related to load/store vectorization #112941

@qedawkins

Description

@qedawkins

The following input example: https://gist.github.com/qedawkins/c620832f96a5c504295f9694cc8956e2

Which is approximately compiled from

// int8_t a[144]; // [2 x 8 x 3 x 3]
// int8_t b[144]; // [2 x 8 x 3 x 3]
// int c[2];
__global__ void helloworld(int8_t* a, int8_t* b, int* c)
{
  int id = hipThreadIdx_x
  if (id < 2) {
    for (int c = 0; c < 8; ++c) {
      for (int h = 0; h < 3; ++h) {
        for (int w = 0; w < 3; ++w) {
          int x = a[id * 72 + c * 9 + h * 3 + w];
          int y = b[id * 72 + c * 9 + h * 3 + w];
          c[id] += x * y;
        }
      }
    }
  }
}

When compiled with

llc -O3 module_test_dispatch_0_rocm_hsaco_fb.optimized.ll -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100

Produces incorrect numerics. For inputs

a = np.ones((2, 8, 3, 3), dtype=np.int8)
b = np.broadcast_to(np.array([1, 2, 1], dtype=np.int8), (2, 8, 3, 3))

In other words only the values of b vary along the inner most loop, this gives c = [88, 88] when the correct values should be (1 + 2 + 1) * 3 * 8 = [96, 96]

Compiling with

llc -O3 module_test_dispatch_0_rocm_hsaco_fb.optimized.ll -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -amdgpu-load-store-vectorizer=false

(turning off the load/store vectorizer) or just with -O1 instead of -O3 gives the correct values. I am not sure exactly how the generated code is miscompiling and don't know whether load/store vectorization is the root cause or if disabling it is just masking the problem. Further information about the downstream issue here: iree-org/iree#18798

cc @MaheshRavishankar

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions