Skip to content

Bug: ks_solver=cusolver not working properly reporting CUDA errorΒ #5240

@AsTonyshment

Description

@AsTonyshment

Describe the bug

When I set ks_solver=cusolver, ABACUS quits with CUDA error (for example, a simple SCF task). After my testing, this seems to occur after PR #5225. Maybe @Cstandardlib can have a look?

 START CHARGE      : atomic
 DONE(1.21413    SEC) : INIT SCF
 * * * * * *
 << Start SCF iteration.
CUDA error at /home/itztony/Documents/Research/Coding/abacus-develop/source/module_hsolver/kernels/cuda/diag_cusolver.cu:132 code=1(cudaErrorInvalidValue) "cudaMemcpy(d_A2, A, sizeof(cuDoubleComplex) * lda * m, cudaMemcpyHostToDevice)" 
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[48991,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).

Metadata

Metadata

Assignees

No one assigned

    Labels

    GPU & DCU & HPCGPU and DCU and HPC related any issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions