Skip to content

hPsi Bug: Precision Discrepancy in hPsi Function Between Single-Core and Multi-Core Execution #5854

@Cstandardlib

Description

@Cstandardlib

Describe the bug

When checking #5849, I found that for the same input of psi, the output of hPsi function will be slightly different under single- and multi-core, thus leading to accumulating errors in BPCG method.

The following demonstrates partial values of the first psi and hpsi in the BPCG algorithm.

  • Single core input & output
 DONE(0.379059   SEC) : INIT SCF
MPI rank = 0
psi topleft 3x3
(1.5714165849967657982e-18,1.4756583960389107859e-18) (-1.4440292006116908902e-18,1.9875356842970415933e-18) (-7.3996684640044956504e-18,-4.0680021628763537946e-18) 
(-0.0081718516751464586462,-0.0076738795114850624074) (-0.015018795849101124196,0.020671599073354016141) (0.038480561876740024263,0.021154867911279531811) 
(0.0081718516751464586462,0.0076738795114850632748) (0.01501879584910112593,-0.020671599073354023079) (-0.038480561876740031202,-0.021154867911279535281) 
---
hpsi_out topleft 3x3
(-0.00028491795659081034422,0.0011843603411937388137) (-0.00024449786422374805315,0.00071164769052877070461) (-0.00049446360939629330038,0.00015961401811020794746) 
(-0.039738892162869521307,-0.032085395147622847167) (-0.0017457343776730027685,-0.001957554915282266883) (-0.06547436267479628258,-0.035492998471474412892) 
(0.041807460714903849075,0.031001562348305811145) (0.003692542961325595563,0.0010716678309309890516) (0.066492434985828685612,0.034717498262674922893) 
---
  • Multi core input & output
MPI rank = 3
psi topleft 3x3
(1.5714165849967657982e-18,1.4756583960389107859e-18) (-1.4440292006116908902e-18,1.9875356842970415933e-18) (-7.3996684640044956504e-18,-4.0680021628763537946e-18) 
(-0.0081718516751464586462,-0.0076738795114850624074) (-0.015018795849101124196,0.020671599073354016141) (0.038480561876740024263,0.021154867911279531811) 
(0.0081718516751464586462,0.0076738795114850632748) (0.01501879584910112593,-0.020671599073354023079) (-0.038480561876740031202,-0.021154867911279535281)
---
hpsi_out topleft 3x3
(-0.00028491795659081495208,0.0011843603411937232012) (-0.00024449786422377656767,0.00071164769052877374037) (-0.00049446360939631953807,0.00015961401811021968394) 
(-0.039738892162869493552,-0.032085395147622847167) (-0.0017457343776730582796,-0.0019575549152821558607) (-0.065474362674796060535,-0.035492998471474274114) 
(0.041807460714903849075,0.031001562348305838901) (0.0036925429613257065853,0.0010716678309308780292) (0.06649243498582857459,0.03471749826267475636) 
---

They begin to differ from the 15th decimal place onward. This numerical instability is accumulated and amplified during the subsequent eigenvalue calculation process, leading to errors in the final results.

Expected behavior

hPsi should give the same result under single- and multi-core executions.

To Reproduce

abacus-develop/tests/integrate/102_PW_BPCG

OMP_NUM_THREADS=1 mpirun -np 1 abacus
OMP_NUM_THREADS=1 mpirun -np 4 abacus

print the topleft corner of psi and hpsi in function

template<typename T, typename Device>
void DiagoBPCG<T, Device>::calc_hsub_with_block(
        const HPsiFunc& hpsi_func,
        T *psi_in,
        ct::Tensor& psi_out,
        ct::Tensor& hpsi_out,
        ct::Tensor& hsub_out,
        ct::Tensor& workspace_in,
        ct::Tensor& eigenvalue_out)
{
    // print psi topleft 3x3 here
 
    // Apply the H operator to psi and obtain the hpsi matrix.
    this->calc_hpsi_with_block(hpsi_func, psi_in, hpsi_out);

    // print hpsi_out topleft 3x3 here
...
}

Environment

  • OS: Ubuntu 22.04.4 LTS
  • Compiler: gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugsBugs that only solvable with sufficient knowledge of DFT

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions