Commit e1946af
authored
feat(gint): add single-precision (fp32) support for grid integration module (Useful Information for implementing mixed (single/double) precision calculations for grid integral operation (#7149)
* module_gint: lay Phase A groundwork for fp32 CPU execution
Add the execution-precision scaffolding without changing the default fp64 behavior: introduce GintRealPrecision/GintExecConfig, extend cal_gint_vl and cal_gint_rho with optional config entry points, instantiate BaseMatrix<float>/AtomPair<float>/HContainer<float>, and expose GintInfo::get_hr<float>() for future internal fp32 buffers.
Also add serial HContainer cast helpers in gint_common so later phases can move between float and double containers at the module_gint boundary, and record the stage handoff in progress.md.
Validation: ../build.sh build_phase_a; cmake --build build_phase_a -j14; OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket build_phase_a/abacus_2g in tests/performance/P101_si32_lcao completed successfully. Extracted runtime properties differ from the checked-in result.ref and are documented in progress.md for follow-up.
* module_gint: add staged fp32 internal path for gint_vl
Stage B focuses on the CPU gint_vl path while keeping the public hR interface in double precision. Gint_vl is refactored into a precision-aware dispatcher plus templated Real kernels so the module can execute either fp64 or fp32 internally based on GintExecConfig without changing existing callers.
The fp32 path now casts the local vr_eff buffer once, evaluates phi/phi_vldr3 in float, accumulates into HContainer<float>, then casts the serial result back to HContainer<double> before compose_hr_gint/transfer_hr_gint_to_hR. compose_hr_gint is generalized to work with both float and double containers, and gint_interface now forwards the execution config into the CPU gint_vl entry point.
During full linking, the new fp32 phi path exposed a missing GintAtom<float> explicit instantiation; this commit adds the required set_phi<float> and set_phi_dphi<float> instantiations so the single-precision gint_vl chain links cleanly end to end.
Validation for this stage: ../build.sh build_phase_b; cmake --build build_phase_b -j14; OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket build_phase_b/abacus_2g in tests/performance/P101_si32_lcao. The build and runtime both succeeded. The extracted P101 outputs remain identical to stage A under the current default fp64 integration path, which is expected because upper-layer callers still pass the default execution config. progress.md is updated with the implementation notes, the transient link issue, and the verification record for stage B.
* module_gint: implement phase C mixed-precision gint_rho pipeline
Phase C of the module_gint fp32 support plan is now in place for the CPU gint_rho path. This keeps the public API and external rho accumulation in double precision while enabling an internal float execution path that mirrors the phase-B gint_vl structure.
Key code changes:
- refactor Gint_rho into a dispatcher plus cal_gint_impl<Real>() so cpu_internal_real can select fp64 or fp32 internally
- move dm_gint allocation into template-local std::vector<HContainer<Real>> buffers instead of storing a fixed double container in the class
- generalize transfer_dm_2d_to_gint to transfer_dm_2d_to_gint<TGint, TDM>() and add the float<-double instantiation by gathering serial double DMR first and then casting locally
- extend PhiOperator::phi_dot_phi to support independent input/output precisions so float phi and phi_dm can accumulate directly into external double rho
- wire GintExecConfig through cal_gint_rho CPU dispatch instead of ignoring the config argument
Verification:
- ../build.sh build_phase_c
- cmake --build build_phase_c -j14
- OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket /home/dzc/abacus/abacus-mix/build_phase_c/abacus_2g in tests/performance/P101_si32_lcao (exit code 0)
- catch_properties output remained stable relative to earlier stages: etotref -3403.2017700426458759, totalforceref 2.961504, totalstressref 469.743372; totaltimeref is still not extracted in this environment
* module_gint: wire phase D SCF precision control for LCAO
Phase D connects the mixed-precision gint execution paths from phases B/C to the LCAO SCF control flow. The SCF layer now owns a stateful precision controller, updates it after each iteration, and pushes the current GintExecConfig into the main LCAO gint_vl and gint_rho call sites.
Key code changes:
- add source_estate/module_charge/GintPrecisionController with the phase-D policy: start in fp32, switch to fp64 after two consecutive non-restart iterations with drho <= max(100 * scf_thr, 1e-5), and never switch back within the same SCF
- store the controller inside Charge_Mixing and reset/update it from chgmixing_ks_lcao() and chgmixing_ks()
- propagate current GintExecConfig from ESolver_KS_LCAO::iter_init() into Charge and Potential so upper layers can read precision state without coupling module_gint back to SCF logic
- wire LCAO SCF main-path gint callers to the propagated config: veff_lcao uses Potential::get_gint_exec_config() for non-meta-GGA gint_vl, rho_tau_lcao uses Charge::get_gint_exec_config() for gint_rho, and the ElecStateLCAO PEXSI dm2rho path now follows the same config
- keep non-SCF and post-processing paths on the default fp64 behavior by not pushing a non-default config into those paths
Verification:
- ../build.sh build_phase_d
- cmake --build build_phase_d -j14
- OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket /home/dzc/abacus/abacus-mix/build_phase_d/abacus_2g in tests/performance/P101_si32_lcao (exit code 0)
- catch_properties output remained stable: etotref -3403.2017700426458759, totalforceref 2.961504, totalstressref 469.743372; totaltimeref is still not extracted in this environment
* module_gint: redo phase D with scoped SCF precision context
Replace the first phase-D wiring with a less invasive integration path for LCAO SCF mixed precision.
Key changes:
- keep GintPrecisionController as the SCF policy holder, but move ownership into ESolver_KS_LCAO instead of Charge_Mixing
- add ModuleGint::current_exec_config() and ModuleGint::ScopedExecConfig so module_gint can read the active precision from a scoped context
- scope the active GintExecConfig inside ESolver_KS_LCAO::hamilt2rho_single(), which covers the main updateHk -> veff_lcao -> dm2rho path without threading config through Charge or Potential
- add no-config overloads for cal_gint_vl() and cal_gint_rho() that fall back to the scoped module_gint context
- revert the previous intrusive state propagation from Charge, Potential, Charge_Mixing, veff_lcao, rho_tau_lcao, and the PEXSI dm2rho call site
- keep non-SCF callers on the default fp64 path unless they explicitly create a scoped config
Validation:
- ../build.sh build_phase_d
- cmake --build build_phase_d -j14
- cd tests/performance/P101_si32_lcao && OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket /home/dzc/abacus/abacus-mix/build_phase_d/abacus_2g
Observed results:
- etotref -3403.2017700426454212
- totalforceref 2.961504
- totalstressref 469.743372
- Gint cal_gint_vl: 1.70 s / 9 calls
- Gint cal_gint_rho: 1.73 s / 9 calls
- LCAO_domain dm2rho: 1.78 s / 9 calls
* module_gint: complete phase E validation and test scaffolding
Phase E adds focused regression coverage around the new CPU single-precision execution path and the stage-D precision scheduler. The module_gint test tree now exercises get_hr<float>(), HContainer casting helpers, transfer_dm_2d_to_gint<float,double>(), and ScopedExecConfig restore semantics. To make those paths testable, GintInfo exposes lightweight test-only factory helpers and get_hr<T>() is moved inline into the header.
The stage also adds runtime override support to GintPrecisionController through ABACUS_GINT_FORCE_CPU_REAL={fp32,fp64,auto}, plus a dedicated estate-side unit test covering reset/update behavior under forced and automatic modes. This keeps the phase-D scoped-context design intact while giving us a low-intrusion switch for performance and numerical comparison runs.
While building the new tests, phase E uncovered and fixed a real serial-path bug in make_cast_hcontainer(): the old code used the parallel insert_ijrs overload even when paraV was absent. The serial branch now reconstructs the destination HContainer topology from the source atom-pair/R-index layout before allocating and casting values.
Validation performed for this commit:
* ../build.sh build_phase_e
* cmake --build build_phase_e -j14
* cmake --build build_phase_e -j14 --target abacus_2g
* ../build.sh build_phase_e_ut
* cmake -S . -B build_phase_e_ut -DBUILD_TESTING=ON
* cmake --build build_phase_e_ut -j14 --target MODULE_LCAO_gint_common_test MODULE_LCAO_gint_precision_test MODULE_ESTATE_gint_precision_controller
* ctest --output-on-failure -R 'MODULE_LCAO_gint_common_test|MODULE_LCAO_gint_precision_test|MODULE_ESTATE_gint_precision_controller'
* OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket build_phase_e/abacus_2g in tests/performance/P101_si32_lcao
* ABACUS_GINT_FORCE_CPU_REAL=fp32 OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket build_phase_e/abacus_2g
* ABACUS_GINT_FORCE_CPU_REAL=fp64 OMP_NUM_THREADS=7 mpirun -n 2 --bind-to socket build_phase_e/abacus_2g
Observed on P101_si32_lcao: mixed/fp32/fp64 total energies stay within about 1e-12 eV, extracted force/stress totals remain identical at script precision, and no clear speedup appears yet, suggesting other work such as set_phi or conversion overhead still dominates this case.
* module_gint: simplify precision switch logic in GintPrecisionController
- Remove qualified-iteration counting mechanism, switch to fp64
immediately when drho falls below threshold
- Simplify update_after_iteration interface (remove iter, conv_esolver,
is_restart_step parameters)
- Adjust switch threshold from 100*scf_thr to 1000*scf_thr
- Update unit tests to match simplified interface
- Add device cpu to performance test inputs
- Update .gitignore for build/profiling artifacts
* refactor(gint): make gint_rho use fp32-only rho cache
Refactor Gint_rho to follow the same if-constexpr style used by gint_vl.
For double precision runs, write rho contributions directly into the external double buffer and skip any temporary cache allocation or copy-back work.
For single precision runs, convert the external rho buffer into a float cache before phi_dot_phi, use that cache during grid integration, and cast it back to the external double buffer after cal_gint finishes.
* perf(gint): optimize set_phi by grouping orbitals into radial blocks
- Introduce RadialBlock to GintAtom to store contiguous (L, N, m) orbital information.
- Pre-calculate these blocks in the GintAtom constructor.
- Refactor set_phi to iterate over radial blocks, significantly reducing the overhead of indexing and branching in the hot loop.
- Walk the Ylm buffer linearly within each block, avoiding repeated lookups of atom_->iw2_ylm[iw].
* add "gint_precision" parameter
* simplify precision control
* unify some variables' name
* fix: address PR review issues for gint-mix-precision
- Remove progress.md (internal development log)
- Clean .gitignore (remove personal profile/debug entries)
- Remove unused is_restart_step variable in iter_finish
- Enable precision update for all SCF calculations (relax, MD, etc.)
- Restore P103_si128_lcao scf_nmax to 100 (was debug artifact)
- Replace magic numbers in switch threshold with named constants
- Add input validation guard for nspin=4 + gint_precision!=double
* fix: replace C++17 features with C++11-compatible patterns for CI compatibility
The project default CMAKE_CXX_STANDARD is C++11. The 'Build without MPI'
and 'abacuslite' CI checks failed because gint_vl.cpp, gint_rho.cpp, and
gint_common.cpp used C++17 features (if constexpr, std::is_same_v) that
are unavailable at C++11/14.
Replace all if constexpr with tag-dispatch overloading using
std::true_type/std::false_type to select same-type vs cross-type code
paths at compile time. Replace std::is_same_v<A,B> with
std::is_same<A,B>::type for compile-time tag generation.
* refactor: simplify C++11-compatible type dispatch in gint module
Replace tag-dispatch overloading (dummy pointer tags, std::true_type/
std::false_type) with simpler C++11 patterns:
- gint_vl.cpp / gint_rho.cpp: remove dummy tag parameters; rely on
natural overload resolution (non-template double overload vs template).
- gint_common.cpp: replace 6 tag-dispatch helpers (gather_dm_serial +
gather_dm_nspin4) with 2 enable_if SFINAE overloads (gather_dm),
unifying serial and nspin4 call paths.
Net reduction of ~40 lines with identical semantics.
* feat: add SCF output notice for gint precision mode (mix/single)
- Print notification at SCF start when gint_precision is 'mix' or 'single'
- Print notification when precision switches from fp32 to fp64 in mix mode
- Change update_after_iteration to return bool indicating if switch occurred
- Update unit tests to verify return values1 parent 6289c6b commit e1946af
39 files changed
Lines changed: 974 additions & 123 deletions
File tree
- docs
- advanced/input_files
- source
- source_esolver
- source_estate
- module_charge
- test
- source_io
- module_parameter
- test_serial
- test
- source_lcao
- module_gint
- test
- module_hcontainer
- tests/performance
- P101_si32_lcao
- P103_si128_lcao
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
759 | 760 | | |
760 | 761 | | |
761 | 762 | | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
762 | 773 | | |
763 | 774 | | |
764 | 775 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
259 | 270 | | |
260 | 271 | | |
261 | 272 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
308 | 308 | | |
309 | 309 | | |
310 | 310 | | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
311 | 329 | | |
312 | 330 | | |
313 | 331 | | |
| |||
439 | 457 | | |
440 | 458 | | |
441 | 459 | | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
442 | 468 | | |
443 | 469 | | |
444 | 470 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| 103 | + | |
| 104 | + | |
102 | 105 | | |
103 | 106 | | |
104 | 107 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
Lines changed: 67 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
Lines changed: 37 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
Lines changed: 63 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
0 commit comments