Skip to content

Conversation

@andfau-amd
Copy link
Contributor

@andfau-amd andfau-amd commented Mar 20, 2025

Resolves #52.

The README is changed to suggest using the pre-release IREE builds (the CI already used them), because the linalg.matmul constructor is broken in the latest stable IREE build, but not on the latest IREE.

@andfau-amd andfau-amd requested a review from kuhar March 20, 2025 11:12
@andfau-amd
Copy link
Contributor Author

(This is stacked on top of #56, please merge that one first.)

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Switching to a nightly iree realease should fix the issue with matmul constructor.

@andfau-amd andfau-amd force-pushed the gemm-mlir-use-bindings-2 branch from 0e0d767 to 7887185 Compare March 20, 2025 17:29
@andfau-amd andfau-amd changed the title Use IREE's MLIR builder Python bindings for gemmbench transposed matmul Use IREE's MLIR builder Python bindings for gemmbench IR generation Mar 20, 2025
@andfau-amd andfau-amd force-pushed the gemm-mlir-use-bindings-2 branch from 7887185 to 4256a8c Compare March 20, 2025 17:33
@andfau-amd andfau-amd force-pushed the gemm-mlir-use-bindings-2 branch from 4256a8c to 33dd27c Compare March 20, 2025 17:50
Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good % one minor issue

@andfau-amd andfau-amd force-pushed the gemm-mlir-use-bindings-2 branch from 33dd27c to ca3b56f Compare March 21, 2025 10:51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some constructor that doesn't produce these cast attributes? I think it's fine as-is and being explicit doesn't hurt, but it does look odd since nothing else in IREE produces these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate it, I'm curious myself as to what this means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be done with the constructor currently in use here, it takes a TypeFn for the cast argument and that can be either cast_signed or cast_unsigned, there's no other options (I tried None and creating an empty TypeFn, neither were accepted). I tried doing this instead:

acc = linalg.MatmulTransposeAOp(inputs = [arg0, arg1], outputs = [filled_tensor], result_tensors = [acc_type])

But it seems like that produces an invalid operation; the IR prints weirdly after I do that:

%3 = "linalg.matmul_transpose_a"(%arg0, %arg1, %2) <{operandSegmentSizes = array<i32: 2, 1>}> ({
    }) {linalg.memoized_indexing_maps = [#map, #map1, #map2]} : (tensor<5120x32000xbf16>, tensor<5120x1xbf16>, tensor<32000x1xf32>) -> tensor<32000x1xf32>

And I get this error if I call .verify():

E           iree.compiler._mlir_libs._site_initialize.<locals>.MLIRError: Verification failed:
E           error: "gemm_32000_1_5120_f16_f32_tA": 'linalg.matmul_transpose_a' op expects to have 1 region with 1 block
E            note: "gemm_32000_1_5120_f16_f32_tA": see current operation:
E             %3 = "linalg.matmul_transpose_a"(%arg0, %arg1, %2) <{operandSegmentSizes = array<i32: 2, 1>}> ({
E             }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2) -> (d2, d0)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>]} : (tensor<5120x32000xf16>, tensor<5120x1xf16>, tensor<32000x1xf32>) -> tensor<32000x1xf32>

There's probably some way to repair it but I'm not sure if it's worth the effort. Maybe I can ask for a second opinion from someone who knows linalg better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ask on IREE discord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we file an issue against mlir (upstream)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly either way on that, maybe @rkayaith or @makslevental have an opinion there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the python bindings are doing anything wrong necessarily, the C++ side defaults to cast_signed as well:

// LinalgNamedStructuredOps.yamlgen.td

def MatmulTransposeBOp : LinalgStructuredBase_Op<"matmul_transpose_b", !listconcat([AttrSizedOperandSegments],
  /*extraInterfaces=*/[LinalgContractionOpInterface])> {
    ...
    let arguments = (ins
      Variadic<AnyType>:$inputs,
      Variadic<AnyShaped>:$outputs,
DefaultValuedOptionalAttr<TypeFnAttr, "TypeFn::cast_signed">:$cast
    );
    ...

But unfortunately using DefaultValuedOptionalAttr causes the attribute to only be elided in the asm format when the attribute is missing. IMO using DefaultValuedAttr instead (no Optional) + updating the custom asm format to elide the default would be a good solution here, but I don't know if there'd be any repercussions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created llvm/llvm-project#132961 now. I am not very confident I have described the issue well, so if anyone thinks they can add something by commenting there, I would appreciate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cc the people who made this change? Otherwise they may not ever see this issue.

Resolves nod-ai#52.

The README is changed to suggest using the pre-release IREE builds (the
CI already used them), because the linalg.matmul constructor is broken
in the latest stable IREE build, but not on the latest IREE.
@andfau-amd andfau-amd force-pushed the gemm-mlir-use-bindings-2 branch from ca3b56f to baddea3 Compare March 21, 2025 16:25
@andfau-amd andfau-amd merged commit f5a6810 into nod-ai:main Mar 25, 2025
1 check failed
@andfau-amd andfau-amd deleted the gemm-mlir-use-bindings-2 branch April 8, 2025 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use python bindings to emit gemmbench mlir

4 participants