Fix: use gemm instead of einsum in BPCG #5827

Cstandardlib · 2025-01-06T13:25:48Z

Linked Issue

try to fix #3437

What's changed?

Use gemm instead of einsum in BPCG.
Introduce a new parameter that takes psi.get_current_ngk() to address leading dimension vs valid dimension.

Unit Tests and/or Case Tests for my changes

BPCG test is modified to take the new parameter.
Integration testing is necessary to verify the validity of the changes. (tests/integrate/102_PW_BPCG)
Important: result.ref needs to be adjusted to new code output!

Cstandardlib · 2025-01-06T13:35:14Z

There are still some strange issues when running tests/integrate/102_PW_BPCG.
Originally, this test does not converge when running in parallel.

OMP_NUM_THREADS=1 mpirun -np 4 abacus

It oscillates around the wrong result and will not converge in 100 iter steps:

 START CHARGE      : atomic
 DONE(0.378312   SEC) : INIT SCF
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 BP1     -3.20615851e+03   0.00000000e+00   6.2402e+01   0.95
 BP2     -3.80285738e+03  -5.96698874e+02   2.7990e+01   0.17
 BP3     -4.10701176e+03  -3.04154385e+02   6.5602e+00   0.18
 BP4     -4.04583492e+03   6.11768470e+01   1.1033e+00   0.17
 BP5     -4.04704636e+03  -1.21144075e+00   9.9010e-01   0.18
 BP6     -4.03059998e+03   1.64463831e+01   9.1813e-01   0.17
 BP7     -3.99186247e+03   3.87375006e+01   9.7185e-01   0.17
 BP8     -3.96744993e+03   2.44125466e+01   1.1061e+00   0.18
 BP9     -3.97584003e+03  -8.39010274e+00   9.6145e-01   0.18
 BP10    -3.94152708e+03   3.43129544e+01   9.6275e-01   0.17
 BP11    -3.93178473e+03   9.74234358e+00   7.6134e-01   0.18

While running by single core it can give the right results in 9 steps:

START CHARGE      : atomic
 DONE(0.442483   SEC) : INIT SCF
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 BP1     -4.87034216e+03   0.00000000e+00   1.5454e+00  37.38
 BP2     -4.86916145e+03   1.18070831e+00   3.9344e-01   7.28
 BP3     -4.86973378e+03  -5.72328822e-01   1.3878e-02   6.04
 BP4     -4.86974314e+03  -9.36599264e-03   1.2855e-03   5.89
 BP5     -4.86974610e+03  -2.95286226e-03   1.8366e-04   5.72
 BP6     -4.86974692e+03  -8.21680864e-04   2.8456e-05   5.53
 BP7     -4.86974704e+03  -1.19617951e-04   2.7626e-06   5.93
 BP8     -4.86974705e+03  -1.34459124e-05   2.3207e-07   6.37
 BP9     -4.86974705e+03  -9.02173064e-07   2.4438e-08   6.34

Cstandardlib · 2025-01-06T13:57:27Z

When replacing 4 of 5 einsums in BPCG, the serial version will still produce the right result in integrate test in 9 steps:

Replace the first einsum in orth_cholesky: hsub_out = ct::op::einsum("ij,kj->ik", psi_out, psi_out, option);.
The second einsum in orth_projection: hsub_in = ct::op::einsum("ij,kj->ik", grad_out, psi_in, option);.
The third einsum in orth_projection: grad_out = ct::op::einsum("ij,jk->ik", hsub_in, psi_in, option);
The 5th einsum in diag_hsub
It will produce exactly the same result as original ones.
Changing the 4th einsum in rotate_wf will lead to wrong serial result, oscillating around 1e-3:

START CHARGE      : atomic
 DONE(0.434527   SEC) : INIT SCF
 ITER       ETOT/eV          EDIFF/eV         DRHO     TIME/s
 BP1     -4.86923965e+03   0.00000000e+00   1.5121e+00  35.52
 BP2     -4.86283925e+03   6.40040239e+00   4.9555e-01   5.97
 BP3     -4.86611767e+03  -3.27841978e+00   1.2539e-02   5.88
 BP4     -4.86574893e+03   3.68737277e-01   6.6463e-03   6.07
 BP5     -4.86570499e+03   4.39385803e-02   3.7749e-03   7.38
 BP6     -4.86494756e+03   7.57431141e-01   3.1324e-03   6.53
 BP7     -4.86623319e+03  -1.28563301e+00   1.5995e-03   7.43
 BP8     -4.86433126e+03   1.90193658e+00   5.7517e-03   6.98
 BP9     -4.86553200e+03  -1.20073878e+00   1.0246e-03   7.06
 BP10    -4.86593390e+03  -4.01906932e-01   1.6900e-03   5.75

I will go and investigate.

Cstandardlib · 2025-01-06T14:06:34Z

It turns out that the 4th gemm should take n_basis as dimension of workspace_in, not n_dim.

// workspace_in = ct::op::einsum("ij,jk->ik", hsub_in, psi_out, option);

    // gemm: workspace_in(n_basis x n_band) = psi_out(n_basis x n_band) * hsub_in(n_band x n_band)
    gemm_op()(this->ctx,
                'N',
                'N',
                this->n_basis,        //m
                this->n_band,       //n
                this->n_band,       //k
                this->one,          //1.0
                psi_out.data<T>(),
                this->n_basis,      //lda
                hsub_in.data<T>(),
                this->n_band,       //ldb
                this->zero,         //0.0
                workspace_in.data<T>(),
                this->n_basis);     //ldc

Cstandardlib · 2025-01-06T14:58:25Z

tests/integrate/102_PW_BPCG still cannot converge using MPI multicores

Cstandardlib · 2025-01-07T08:01:40Z

There may be some accuracy difference when migrating from einsum to hsolver::gemm_op.
My guess is that this stems from different defaults about memory lay out.
Tensor views data blocks as row-major and uses $C^T=B^TA^T$, calling gemm kernel in reverse argument order. So does einsum.
Direct call to gemm operator with column-major uses normal order $C=AB$. This may lead to minor difference in accuracy.

Cstandardlib · 2025-01-07T09:30:39Z

Inner implementation of Tensor einsum does do a reverse order gemm, while callers in BPCG do an outer reverse order of matrix arguments:

// einsum_op.cpp
// Call the column-major Blas library
kernels::blas_gemm<T, Device>()(
    option.conj_y ? 'C' : trans_y ? 'T' : 'N', 
    option.conj_x ? 'C' : trans_x ? 'T' : 'N', 
    n, m, k, 
    &alpha, 
    y_device_memory_ptrs[0], option.conj_y || trans_y ? k : n, 
    x_device_memory_ptrs[0], option.conj_x || trans_x ? m : k, 
    &beta, 
    z_device_memory_ptrs[0], n);

However, einsum is called as follows because input data is assumed to be column-major:

ct::EinsumOption option(
        /*conj_x=*/false, /*conj_y=*/true, /*alpha=*/1.0, /*beta=*/0.0, /*Tensor out=*/&hsub_in);
    hsub_in = ct::op::einsum("ij,kj->ik", grad_out, psi_in, option);

It actually does this calc job:
gemm: hsub_in(n_band x n_band) = psi_in^T(n_band x n_basis) * grad_out(n_basis x n_band)
So this should not cause the accuracy loss.

Cstandardlib · 2025-01-07T09:32:47Z

The result difference may come from doing gemm with valid elements (n_dim, Incoming parameter for actual, legal dimension) instead of original all elements(n_basis, representing leading dimension).

If this is true, we need to modify the reference data file to adjust to this correction according to new version of BPCG code.
Make sure the code is changed correctly before this change.

This reverts commit a3350b7.

Cstandardlib · 2025-01-07T13:40:35Z

We will change this reference value of totalstressref to the new one.
1: Test accuracy stress: 0.001
[WARNING ] totalstressref cal=37241.09727600 ref=37241.09171000 deviation=-0.00556600

tests/integrate/102_PW_BPCG/result.ref

Cstandardlib · 2025-01-09T04:29:03Z

Integrage test INPUT and REF are updated after PR #5775

Cstandardlib · 2025-01-09T15:00:46Z

New ref will be updated according to results as follows:

1: [ RUN      ] 102_PW_BPCG
1: Info: Local MPI proc number: 1,OpenMP thread number: 2,Total thread number: 2,Local thread limit: 8
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [      OK  ]  totalforceref
1: [WARNING   ]  totalstressref cal=37241.45334600 ref=37241.44843500 deviation=-0.00491100
1: [      OK  ]  pointgroupref
1: [      OK  ]  spacegroupref
1: [      OK  ]  nksibzref

* Add dimension parameter for BPCG method * Add utils for hsovler gemm_op * Change code to fit new bpcg init interface * using gemm instead of einsum in orth_cholesky * using gemm instead of einsum in orth_projection * replace einsum by gemm in orth_projection * replace einsum by gemm in rotate_wf * replace einsum by gemm in diag_hsub * Update 102_PW_BPCG totalstressref reference value

Cstandardlib added 9 commits January 6, 2025 16:36

Add dimension parameter for BPCG method

16a26a3

Add utils for hsovler gemm_op

d7c66b1

Change code to fit new bpcg init interface

bcf492f

using gemm instead of einsum in orth_cholesky

31d78ff

using gemm instead of einsum in orth_projection

487b6b2

replace einsum by gemm in orth_projection

d901d00

replace einsum by gemm in rotate_wf

e2dc4a1

replace einsum by gemm in diag_hsub

3a70e1b

Merge branch 'develop' into fix/bpcg-gemm-insteadof-einsum

14c3693

fix wrong dimension of gemm_op in rotate_wf

fe19e07

Cstandardlib marked this pull request as ready for review January 6, 2025 14:24

Cstandardlib requested a review from Qianruipku January 6, 2025 14:57

Cstandardlib added 3 commits January 7, 2025 10:21

Revert change of einsum in rotate_wf:

87f0d36

Merge branch 'develop' into fix/bpcg-gemm-insteadof-einsum

830188a

Revert gemm substitute for einsum

a3350b7

Revert last commit, substitute gemm for einsum

ab08f46

This reverts commit a3350b7.

Update 102_PW_BPCG totalstressref reference value

1813a56

mohanchen approved these changes Jan 8, 2025

View reviewed changes

mohanchen reviewed Jan 8, 2025

View reviewed changes

tests/integrate/102_PW_BPCG/result.ref Outdated Show resolved Hide resolved

mohanchen added Diago Issues related to diagonalizaiton methods Refactor Refactor ABACUS codes labels Jan 8, 2025

Merge branch 'develop' into fix/bpcg-gemm-insteadof-einsum

570131e

Merge branch 'develop' into fix/bpcg-gemm-insteadof-einsum

d8045b8

Cstandardlib added 2 commits January 9, 2025 23:01

Merge branch 'develop' into fix/bpcg-gemm-insteadof-einsum

15d28ed

Update 102_PW_BPCG totalstressref reference value

0d87baa

Qianruipku approved these changes Jan 10, 2025

View reviewed changes

Qianruipku merged commit c898e52 into deepmodeling:develop Jan 10, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: use gemm instead of einsum in BPCG #5827

Fix: use gemm instead of einsum in BPCG #5827

Uh oh!

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Uh oh!

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Uh oh!

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Uh oh!

Cstandardlib commented Jan 6, 2025

Uh oh!

Cstandardlib commented Jan 6, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025 •

edited

Loading

Uh oh!

Cstandardlib commented Jan 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Cstandardlib commented Jan 9, 2025

Uh oh!

Cstandardlib commented Jan 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: use gemm instead of einsum in BPCG #5827

Fix: use gemm instead of einsum in BPCG #5827

Uh oh!

Conversation

Cstandardlib commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked Issue

What's changed?

Unit Tests and/or Case Tests for my changes

Uh oh!

Cstandardlib commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cstandardlib commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cstandardlib commented Jan 6, 2025

Uh oh!

Cstandardlib commented Jan 6, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025

Uh oh!

Cstandardlib commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cstandardlib commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Cstandardlib commented Jan 9, 2025

Uh oh!

Cstandardlib commented Jan 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Cstandardlib commented Jan 6, 2025 •

edited

Loading

Cstandardlib commented Jan 7, 2025 •

edited

Loading

Cstandardlib commented Jan 7, 2025 •

edited

Loading