Skip to content

Conversation

@Cstandardlib
Copy link
Collaborator

@Cstandardlib Cstandardlib commented Jan 12, 2025

Linked Issue

Fix #3437

What's changed?

  • Add MPI multicore planewave parallization support for BPCG method.

@Cstandardlib Cstandardlib changed the title Feature: Add parallel support for BPCG method Feature: Add planewave parallization support for BPCG method Jan 12, 2025
@Cstandardlib
Copy link
Collaborator Author

Cstandardlib commented Jan 13, 2025

Currently, BPCG converges under MPI multi-core conditions, but there is a slight difference in precision compared to single-core, which requires further investigation.

Integrate test integrate/102_PW_BPCG will still run in serial mode, i.e. mpirun -np 1 $abacus.

@Cstandardlib
Copy link
Collaborator Author

Tests show that eigenvalues diverge at the very first round of iter:

  • np = 4
 DONE(0.387236   SEC) : INIT SCF
eigenvalue[0] = -1.59241
eigenvalue[1] = -1.59117
eigenvalue[2] = -1.58078
eigenvalue[3] = -1.57916
eigenvalue[4] = -1.57533
eigenvalue[5] = -0.337989
eigenvalue[6] = -0.202191
eigenvalue[7] = -0.196922
eigenvalue[8] = -0.172316
eigenvalue[9] = -0.166048
eigenvalue[10] = -0.157684
eigenvalue[11] = 0.560756
eigenvalue[12] = 0.619775
eigenvalue[13] = 0.673029
eigenvalue[14] = 0.684267
eigenvalue[15] = 0.967494
eigenvalue[16] = 0.988857
eigenvalue[17] = 1.02778
eigenvalue[18] = 1.19561
eigenvalue[19] = 1.449
eigenvalue[20] = 1.48429
eigenvalue[21] = 1.48959
eigenvalue[22] = 1.55905
eigenvalue[23] = 1.57068
  • np = 1
 DONE(0.385729   SEC) : INIT SCF
eigenvalue[0] = -1.5924
eigenvalue[1] = -1.59117
eigenvalue[2] = -1.58079
eigenvalue[3] = -1.57916
eigenvalue[4] = -1.57533
eigenvalue[5] = -0.337989
eigenvalue[6] = -0.202191
eigenvalue[7] = -0.196922
eigenvalue[8] = -0.172316
eigenvalue[9] = -0.166048
eigenvalue[10] = -0.157684
eigenvalue[11] = 0.560756
eigenvalue[12] = 0.619775
eigenvalue[13] = 0.673029
eigenvalue[14] = 0.684267
eigenvalue[15] = 0.96749
eigenvalue[16] = 0.988901
eigenvalue[17] = 1.02778
eigenvalue[18] = 1.19561
eigenvalue[19] = 1.44897
eigenvalue[20] = 1.48398
eigenvalue[21] = 1.49106
eigenvalue[22] = 1.55956
eigenvalue[23] = 1.5701

and errors accumulate as the iter goes.

@Cstandardlib
Copy link
Collaborator Author

The bug stems from the hPsi function.
See #5854 for details.

@Cstandardlib Cstandardlib marked this pull request as ready for review January 13, 2025 12:24
Copy link
Collaborator

@Qianruipku Qianruipku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change "mpirun -np 1" to "mpirun -np 4" in Autotest.sh and update results.ref?

@Cstandardlib
Copy link
Collaborator Author

Output of mpirun -np 4 on test machine:

1: [ RUN      ] 102_PW_BPCG
1: [      OK  ]  etotref
1: [      OK  ]  etotperatomref
1: [WARNING   ]  totalforceref cal=5.19522000 ref=5.19483000 deviation=-0.00039000
1: [WARNING   ]  totalstressref cal=37241.49490600 ref=37241.45334600 deviation=-0.04156000
1: [      OK  ]  pointgroupref
1: [      OK  ]  spacegroupref
1: [      OK  ]  nksibzref

Update reference value in result.ref accordingly.

@Qianruipku Qianruipku self-requested a review January 13, 2025 14:06
@mohanchen mohanchen added Diago Issues related to diagonalizaiton methods Refactor Refactor ABACUS codes labels Jan 13, 2025
@mohanchen mohanchen merged commit e87414c into deepmodeling:develop Jan 13, 2025
14 checks passed
Fisherd99 pushed a commit to Fisherd99/abacus-BSE that referenced this pull request Mar 31, 2025
…eling#5849)

* Subsitute gemm for einsum in rotate_wf

* Add planewave parallel support for inner-produce like gemm_op in bpcg

* Add reduce for dot ops used in bpcg

* Add reduce for manual inner product(for loop) ops used in bpcg

* Update docs now that BPCG supports plane wave parallelization.

* Update Autotest.sh to run BPCG test with MPI np=4

* remove unused code and redundancies

* Update result.ref for BPCG multicore test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Diago Issues related to diagonalizaiton methods Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SCF is not converged by using BPCG method

3 participants