Skip to content

Conversation

@WangYuyao
Copy link
Contributor

@WangYuyao WangYuyao commented Dec 21, 2024

Overview

This is an implementation of Codegen-only compilation pipeline for LA operations #693 as the coursework for 'LDE: Project'

This PR implements lowering passes for linear algebra related operations involving CSR matrices and interactions between dense and CSR matrices using MLIR. Instead of lowering these operations to pre-compiled C++ kernels, these passes offer an alternative option that lowers them to MLIR by enabling the codegen pipeline (--mlir-codegen).

The lowering of EwUnaryOp and EwBinaryOp on matrix-scalar makes use of linalg GenericOps to perform computations on all elements of the matrix. For other operations, the passes utilize scf.for op and scf.while op for locating the required elements.

Additionally, all of the passes convert the input matrix into memref, perform computations using operations from the arith dialect, and finally convert the memref back into a matrix as the output.

Changes:

Add codegens for

  • SliceOp for warm-up,
  • EwUnaryOp on CSR matrices,
  • EwBinaryOp (Add, Mul) on CSR-Dense/CSR-CSR/CSR-Scalar,
  • MatMulOp on CSR-Dense/CSR-CSR.

Add a new constructor for CSR matrix.

Add convertMemRefToCSRMatrix kernel for converting MemRef to CSR matrix.

Add missing kernels for testing:

  • EwBinaryMat (Add) on CSR-CSR,
  • EwBinaryObjSca on CSR-Scalar,
  • EwUnaryMat on CSR.

Add a script-level test case (GEMM).

Add some necessary instantiations in kernels.json.

Edit related TableGen files.

A Small Example of Lowering A Kernel

%9 = "daphne.matMul"(%7, %8, %3, %3) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>, !daphne.Matrix<5x5xf64:sp[1.000000e+00]>, i1, i1) -> !daphne.Matrix<5x5xf64:sp[0.22621906250000023]:rep[sparse]>

The input CSR matrix is first converted to three memrefs.

%9 = "daphne.convertCSRMatrixToValuesMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<?xf64>
    %10 = "daphne.convertCSRMatrixToColIdxsMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<?xindex>
    %11 = "daphne.convertCSRMatrixToRowOffsetsMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<6xindex>
    %12 = "daphne.convertDenseMatrixToMemRef"(%8) : (!daphne.Matrix<5x5xf64:sp[1.000000e+00]>) -> memref<5x5xf64>

In this case, the result is changed to be a dense matrix, so just one result memref is allocated and initialized, if the result is in CSR matrix, the number will be three.

    %alloc = memref.alloc() : memref<5x5xf64>
    %cst = arith.constant 0.000000e+00 : f64
    linalg.fill ins(%cst : f64) outs(%alloc : memref<5x5xf64>)

Create a scf.for loop for locating the required elements, compute the result by arith operations, and store it in the result memref.

scf.for %arg0 = %c0 to %c5 step %c1 {
      %15 = arith.addi %arg0, %c1 : index
      scf.for %arg1 = %c0 to %c5_0 step %c1 {
        %16 = memref.load %11[%arg0] : memref<6xindex>
        %17 = memref.load %11[%15] : memref<6xindex>
        %18 = scf.for %arg2 = %16 to %17 step %c1 iter_args(%arg3 = %cst) -> (f64) {
          %19 = memref.load %9[%arg2] : memref<?xf64>
          %20 = memref.load %10[%arg2] : memref<?xindex>
          %21 = memref.load %12[%20, %arg1] : memref<5x5xf64>
          %22 = arith.mulf %19, %21 : f64
          %23 = arith.addf %arg3, %22 : f64
          scf.yield %23 : f64
        }
        memref.store %18, %alloc[%arg0, %arg1] : memref<5x5xf64>
      }
    }

Finally, convert the memref back to dense matrix.

%13 = "daphne.convertMemRefToDenseMatrix"(%intptr, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1) : (index, index, index, index, index, index) -> !daphne.Matrix<5x5xf64:sp[1.000000e+00]>

Performance

This is a test case testing the performance of codegen for GEMM operation.

Protocol

  • 2 input matrices' representations (CSR, CSR, CSR) and (CSR, CSR, Dense)
  • 6 matrix sizes: (1000, 1000), (10000, 10000), (50000, 50000), (100000, 100000), (250000, 250000), (500000, 500000)
  • 1 sparsity: 1e-8

Results

image
image

Known Limitations:

  • Dimensions for codegen Ops currently need to be known at compile-time.
  • It is not possible to determine the sparsity of the result before computation. Currently, the lowering pass simply specifies the representation when creating a memref to store the result. However, in matrix multiplication, the sparsity of the output is not solely determined by the sparsity of the inputs. For instance, the result of a CSR @ CSR could be either dense or sparse, depending on the input data.

Copy link
Collaborator

@philipportner philipportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
This already looks promising! A few changes are needed and we should add test cases before merging this in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file and add proper testcases. Look at the testcases in test/codegen/ and test/api/cli/codegen, there you will find different kind of tests for our codegen. FileCheck based tests that verify the IR after a certain pass(es), and end-to-end tests that execute daphne and verify the output. You should add a single file, e.g., test/codegen/sliceop.mlir for the IR tests and a single .cpp file for the end-to-end tests that compare the non codegen with the codegen'd execution of daphne.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine these into a single pass which has two rewrite patterns.

Comment on lines +105 to +115
SmallVector<AffineMap, 2> indexMaps{AffineMap::getMultiDimIdentityMap(2, rewriter.getContext()),
AffineMap::getMultiDimIdentityMap(2, rewriter.getContext())};

SmallVector<utils::IteratorType, 2> iterTypes{utils::IteratorType::parallel,
utils::IteratorType::parallel};

rewriter.create<linalg::GenericOp>(loc, TypeRange{}, ValueRange{selMemref}, ValueRange{resMemref},
indexMaps, iterTypes,
[&](OpBuilder &OpBuilderNested, Location locNested, ValueRange arg) {
OpBuilderNested.create<linalg::YieldOp>(locNested, arg[0]);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed, as we want to create a view of a matrix we don't want to copy the data. The memref::SubViewOp should be enough. You'll need to properly lower it afterwards by adding the following passes to the pipeline:
mlir::memref::createExpandStridedMetadataPass; and mlir::createFinalizeMemRefToLLVMConversionPass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants