[LDE: Project] 'Codegen-only compilation pipeline for LA operations' #935

WangYuyao · 2024-12-21T23:19:29Z

Overview

This is an implementation of Codegen-only compilation pipeline for LA operations #693 as the coursework for 'LDE: Project'

This PR implements lowering passes for linear algebra related operations involving CSR matrices and interactions between dense and CSR matrices using MLIR. Instead of lowering these operations to pre-compiled C++ kernels, these passes offer an alternative option that lowers them to MLIR by enabling the codegen pipeline (--mlir-codegen).

The lowering of EwUnaryOp and EwBinaryOp on matrix-scalar makes use of linalg GenericOps to perform computations on all elements of the matrix. For other operations, the passes utilize scf.for op and scf.while op for locating the required elements.

Additionally, all of the passes convert the input matrix into memref, perform computations using operations from the arith dialect, and finally convert the memref back into a matrix as the output.

Changes:

Add codegens for

SliceOp for warm-up,
EwUnaryOp on CSR matrices,
EwBinaryOp (Add, Mul) on CSR-Dense/CSR-CSR/CSR-Scalar,
MatMulOp on CSR-Dense/CSR-CSR.

Add a new constructor for CSR matrix.

Add convertMemRefToCSRMatrix kernel for converting MemRef to CSR matrix.

Add missing kernels for testing:

EwBinaryMat (Add) on CSR-CSR,
EwBinaryObjSca on CSR-Scalar,
EwUnaryMat on CSR.

Add a script-level test case (GEMM).

Add some necessary instantiations in kernels.json.

Edit related TableGen files.

A Small Example of Lowering A Kernel

%9 = "daphne.matMul"(%7, %8, %3, %3) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>, !daphne.Matrix<5x5xf64:sp[1.000000e+00]>, i1, i1) -> !daphne.Matrix<5x5xf64:sp[0.22621906250000023]:rep[sparse]>

The input CSR matrix is first converted to three memrefs.

%9 = "daphne.convertCSRMatrixToValuesMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<?xf64>
    %10 = "daphne.convertCSRMatrixToColIdxsMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<?xindex>
    %11 = "daphne.convertCSRMatrixToRowOffsetsMemRef"(%7) : (!daphne.Matrix<5x5xf64:sp[5.000000e-02]:rep[sparse]>) -> memref<6xindex>
    %12 = "daphne.convertDenseMatrixToMemRef"(%8) : (!daphne.Matrix<5x5xf64:sp[1.000000e+00]>) -> memref<5x5xf64>

In this case, the result is changed to be a dense matrix, so just one result memref is allocated and initialized, if the result is in CSR matrix, the number will be three.

    %alloc = memref.alloc() : memref<5x5xf64>
    %cst = arith.constant 0.000000e+00 : f64
    linalg.fill ins(%cst : f64) outs(%alloc : memref<5x5xf64>)

Create a scf.for loop for locating the required elements, compute the result by arith operations, and store it in the result memref.

scf.for %arg0 = %c0 to %c5 step %c1 {
      %15 = arith.addi %arg0, %c1 : index
      scf.for %arg1 = %c0 to %c5_0 step %c1 {
        %16 = memref.load %11[%arg0] : memref<6xindex>
        %17 = memref.load %11[%15] : memref<6xindex>
        %18 = scf.for %arg2 = %16 to %17 step %c1 iter_args(%arg3 = %cst) -> (f64) {
          %19 = memref.load %9[%arg2] : memref<?xf64>
          %20 = memref.load %10[%arg2] : memref<?xindex>
          %21 = memref.load %12[%20, %arg1] : memref<5x5xf64>
          %22 = arith.mulf %19, %21 : f64
          %23 = arith.addf %arg3, %22 : f64
          scf.yield %23 : f64
        }
        memref.store %18, %alloc[%arg0, %arg1] : memref<5x5xf64>
      }
    }

Finally, convert the memref back to dense matrix.

%13 = "daphne.convertMemRefToDenseMatrix"(%intptr, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1) : (index, index, index, index, index, index) -> !daphne.Matrix<5x5xf64:sp[1.000000e+00]>

Performance

This is a test case testing the performance of codegen for GEMM operation.

Protocol

2 input matrices' representations (CSR, CSR, CSR) and (CSR, CSR, Dense)
6 matrix sizes: (1000, 1000), (10000, 10000), (50000, 50000), (100000, 100000), (250000, 250000), (500000, 500000)
1 sparsity: 1e-8

Results

Known Limitations:

Dimensions for codegen Ops currently need to be known at compile-time.
It is not possible to determine the sparsity of the result before computation. Currently, the lowering pass simply specifies the representation when creating a memref to store the result. However, in matrix multiplication, the sparsity of the output is not solely determined by the sparsity of the inputs. For instance, the result of a CSR @ CSR could be either dense or sparse, depending on the input data.

philipportner

Thanks for the PR.
This already looks promising! A few changes are needed and we should add test cases before merging this in.

philipportner · 2024-12-23T17:39:26Z

containers/entrypoint-interactive.sh

Remove these changes.

philipportner · 2024-12-23T17:39:32Z

containers/run-docker-example.sh

Remove these changes.

philipportner · 2024-12-23T17:42:42Z

scripts/examples/slice.daph

Remove this file and add proper testcases. Look at the testcases in test/codegen/ and test/api/cli/codegen, there you will find different kind of tests for our codegen. FileCheck based tests that verify the IR after a certain pass(es), and end-to-end tests that execute daphne and verify the output. You should add a single file, e.g., test/codegen/sliceop.mlir for the IR tests and a single .cpp file for the end-to-end tests that compare the non codegen with the codegen'd execution of daphne.

philipportner · 2024-12-23T17:43:17Z

src/compiler/execution/DaphneIrExecutor.cpp

Combine these into a single pass which has two rewrite patterns.

philipportner · 2024-12-23T17:45:24Z

src/compiler/lowering/SliceColOpLowering.cpp

+        SmallVector<AffineMap, 2> indexMaps{AffineMap::getMultiDimIdentityMap(2, rewriter.getContext()),
+                                            AffineMap::getMultiDimIdentityMap(2, rewriter.getContext())};
+
+        SmallVector<utils::IteratorType, 2> iterTypes{utils::IteratorType::parallel,
+                                                      utils::IteratorType::parallel};
+
+        rewriter.create<linalg::GenericOp>(loc, TypeRange{}, ValueRange{selMemref}, ValueRange{resMemref},
+                                           indexMaps, iterTypes,
+                                           [&](OpBuilder &OpBuilderNested, Location locNested, ValueRange arg) {
+                                               OpBuilderNested.create<linalg::YieldOp>(locNested, arg[0]);
+                                           });


This shouldn't be needed, as we want to create a view of a matrix we don't want to copy the data. The memref::SubViewOp should be enough. You'll need to properly lower it afterwards by adding the following passes to the pipeline:
mlir::memref::createExpandStridedMetadataPass; and mlir::createFinalizeMemRefToLLVMConversionPass.

This reverts commit 04fc40e.

WangYuyao added 3 commits December 4, 2024 22:37

build daphne-opt

04fc40e

add SliceRowOpLowering

03cb04a

add SliceColOpLowering

537f655

philipportner requested changes Dec 23, 2024

View reviewed changes

WangYuyao added 26 commits December 28, 2024 01:12

exclude linalg.generic op, add memreftollvm pass

2621af4

Revert "build daphne-opt"

3a9eafe

This reverts commit 04fc40e.

remove improper test case

55d8a76

combine slice row and column

204a511

add ExtractOpLowering

f218f30

add unfinished ExtractOpLowering

99125d4

Merge remote-tracking branch 'upstream/main' into LDEProject

77938fb

add EwUnaryOpsLowering for Sparse Matrix

f234567

comment out extract op lowering

e68919a

add untested kernel for converting memref to CSR

7cb9c3f

add EwUnaryMat for CSR Matrix

edc5546

add a new constructor for CSRMatrix

196b8e1

adapt MemRefToCSR kernel with new constructor

6212dc6

add CSR support to EwBinaryObjSca

5679eea

add EwOpsLowering for Op between CSR and Dense

a2f9c90

add EwBinaryMat kernel for Dense <- (CSR + Dense)

90dabd6

update kernels.json

b171392

add CSR +/* CSR

960a4c2

add EwBinaryMat CSR <- (CSR, CSR)

4752fdb

fix bug

c18516c

fix bug

a8c3268

add Matmul Lowering for (CSR, Dense)

4a8fd6a

add Matmul for (CSR, CSR), correct EwOpsLowering

a3bbd9e

add a script level test case for gemm codegen

d4a7cdb

clean comment-outs

e2476a2

add comments and fix a bug

d5361d4

WangYuyao and others added 5 commits February 24, 2025 21:59

edit the comments

1cbc9f8

Delete src/compiler/lowering/ExtractOpLowering.cpp

ee3959a

remove ExtractOp lowering related

f743c81

optimize CSR Matrix Index in MatMulOpLowering

df2796d

add tests

23921b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LDE: Project] 'Codegen-only compilation pipeline for LA operations' #935

[LDE: Project] 'Codegen-only compilation pipeline for LA operations' #935

Uh oh!

WangYuyao commented Dec 21, 2024 •

edited

Loading

Uh oh!

philipportner left a comment

Uh oh!

philipportner Dec 23, 2024

Uh oh!

philipportner Dec 23, 2024

Uh oh!

philipportner Dec 23, 2024

Uh oh!

philipportner Dec 23, 2024

Uh oh!

philipportner Dec 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[LDE: Project] 'Codegen-only compilation pipeline for LA operations' #935

Are you sure you want to change the base?

[LDE: Project] 'Codegen-only compilation pipeline for LA operations' #935

Uh oh!

Conversation

WangYuyao commented Dec 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes:

A Small Example of Lowering A Kernel

Performance

Protocol

Results

Known Limitations:

Uh oh!

philipportner left a comment

Choose a reason for hiding this comment

Uh oh!

philipportner Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

philipportner Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

philipportner Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

philipportner Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

philipportner Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WangYuyao commented Dec 21, 2024 •

edited

Loading