fix(c++): fix NULL type in custom op #4889

iProzd · 2025-08-14T13:26:07Z

Replaces usage of lmp_list send/recv arrays with new vectors that map indices using fwd_map and synchronize counts via MPI. Updates tensor construction to use these new vectors, improving correctness and flexibility in distributed communication.

Summary by CodeRabbit

Refactor
- Message-passing now remaps send lists to actual atoms before building communication tensors, improving consistency for distributed runs.
Bug Fixes
- Invalid or out-of-range send indices are filtered and counts updated to prevent communication mismatches and related errors.
Tests
- Added a test covering a type-map scenario with a NULL mapping to exercise the updated handling.

Replaces usage of lmp_list send/recv arrays with new vectors that map indices using fwd_map and synchronize counts via MPI. Updates tensor construction to use these new vectors, improving correctness and flexibility in distributed communication.

for more information, see https://pre-commit.ci

coderabbitai · 2025-08-14T13:33:00Z

📝 Walkthrough

Walkthrough

Calls a new helper to remap LAMMPS sendlists using a forward-map before constructing MPI/send tensors in DeepPotPT/DeepPotPD/DeepSpinPT, and adds the helper declaration/implementation and a small test. No public API signature changes beyond the new helper.

Changes

Cohort / File(s)	Summary
Sendlist remapping helper `source/api_cc/src/common.cc`, `source/api_cc/include/common.h`	- Adds `deepmd::select_real_atoms_sendlist(const deepmd::InputNlist& inlist, const std::vector<int>& fwd_map)` implementation and declaration. Filters/remaps per-swap sendlist indices via `fwd_map`, discards invalid mappings, updates `sendlist`, `sendnum`, and `recvnum`.
DeepPot message-passing callers* `source/api_cc/src/DeepPotPT.cc`, `source/api_cc/src/DeepPotPD.cc`, `source/api_cc/src/DeepSpinPT.cc`	- After obtaining `nswap` in the message-passing branch, call `select_real_atoms_sendlist(lmp_list, fwd_map)` so subsequent tensor/comm_dict construction uses remapped sendlists. No other control-flow or API changes.
Tests `source/lmp/tests/test_lammps_dpa_pt.py`	- Adds `test_pair_deepmd_type_map_with_null` which sets up a DeepMD type-map with `pair_coeff("* * H NULL")` and runs 0 steps; no assertions added.

Sequence Diagram(s)

sequenceDiagram
  participant Compute as DeepPot*::compute
  participant Select as select_real_atoms_sendlist
  participant Map as fwd_map
  participant Tensors as Tensor & comm_dict build

  Compute->>Select: call select_real_atoms_sendlist(lmp_list, fwd_map)
  Select->>Map: map each send index -> forwarded index
  Map-->>Select: remapped sendlist (invalids removed)
  Select-->>Compute: lmp_list updated (sendlist/sendnum/recvnum)
  Compute->>Tensors: build send/recv tensors and comm_dict from updated lmp_list

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore(pt): make comm_dict for dpa2 noncompulsory when nghost is 0 #4144 — Modifies DeepPotPT message-passing and mapping/comm_dict handling; touches same sendlist/comm_dict code paths.
feat(pt): pass mapping from LAMMPS to PT #4351 — Adds mapping field/API and changes neighbor-list mapping use in DeepPotPT; closely related to remapping logic.
fix(cc): fix message passing when nghost is 0 and send list is empty #4237 — Adjusts DeepPotPT message-passing/sendlist handling; relevant overlap in compute path edits.

Suggested reviewers

wanghan-iapcm
njzjz
CaRoLZhangxy

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

source/api_cc/src/DeepPotPT.cc (2)
251-257: Remove stale commented-out code

Dead commented code obscures the current data path and makes maintenance harder.

Apply this diff:
-//      torch::Tensor firstrecv_tensor =
-//          torch::from_blob(lmp_list.firstrecv, {nswap}, int32_option);
-//      torch::Tensor recvnum_tensor =
-//          torch::from_blob(lmp_list.recvnum, {nswap}, int32_option);
-//      torch::Tensor sendnum_tensor =
-//          torch::from_blob(lmp_list.sendnum, {nswap}, int32_option);
266-269: Remove redundant commented-out legacy code

Same reasoning; commented legacy path is preserved in git history.

Apply this diff:
-//      int total_send =
-//          std::accumulate(lmp_list.sendnum, lmp_list.sendnum + nswap, 0);
-//      torch::Tensor sendlist_tensor =
-//          torch::from_blob(lmp_list.sendlist, {total_send}, int32_option);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between accc331 and f99ad4d.

📒 Files selected for processing (1)

source/api_cc/src/DeepPotPT.cc (3 hunks)

🔇 Additional comments (2)

source/api_cc/src/DeepPotPT.cc (2)

185-204: Remapping logic LGTM

Correctly rebuilds per-swap send counts and a dense send list using fwd_map with bounds checks and filtering. Reserving capacity via the accumulated legacy counts is a good optimization.

226-232: firstrecv_new is unused and not required — original comment is incorrect

Short: deepmd/pt/model/descriptor/repflows.py builds comm_dict and calls torch.ops.deepmd.border_op with send_list, send_proc, recv_proc, send_num, recv_num, communicator (no first_recv). The computed firstrecv_new/firstrecv_tensor in the PT wrappers is dead code — remove it or document why it is kept.

Files to update:

source/api_cc/src/DeepPotPT.cc

Remove the firstrecv_new prefix-sum computation (around lines 226–231) and the unused firstrecv_tensor creation (around line 238).

source/api_cc/src/DeepSpinPT.cc

Same pattern: firstrecv_tensor is created around lines 187–191 but never used/inserted.

Suggested change (remove unused code) — example diff for DeepPotPT.cc:
@@
    std::vector<int> firstrecv_new(nswap, 0);
     int acc = 0;
     for (int s = 0; s < nswap; ++s) {
       firstrecv_new[s] = acc;
       acc += recvnum_new[s];
     }
    /* firstrecv computation removed — not used by border_op */
@@
   torch::Tensor firstrecv_tensor =
 torch::from_blob(firstrecv_new.data(), {nswap}, int32_option).clone();
   /* firstrecv tensor omitted — border_op expects recv_num, not first_recv */
If you prefer to keep the computation for clarity, add a short comment explaining it's intentionally unused.

Likely an incorrect or invalid review comment.

source/api_cc/src/DeepPotPT.cc

for more information, see https://pre-commit.ci

Copilot

Pull Request Overview

This PR fixes a bug in the C++ custom operations for distributed MPI communication by implementing proper handling of NULL types in the send/receive lists. The fix ensures that atom indices are correctly mapped and synchronized across MPI processes by filtering out invalid indices and updating counts accordingly.

Key changes:

Added a new function to select and remap real atoms in send lists using forward mapping
Integrated the send list filtering into PyTorch, Paddle, and DeepSpin compute functions
Added a test case to verify NULL type handling in LAMMPS pair coefficients

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
source/api_cc/src/common.cc	Implements `select_real_atoms_sendlist` function to filter and remap atom indices in MPI send lists
source/api_cc/include/common.h	Adds function declaration for the new send list selection function
source/api_cc/src/DeepPotPT.cc	Integrates send list filtering into PyTorch backend compute function
source/api_cc/src/DeepPotPD.cc	Integrates send list filtering into Paddle backend compute function
source/api_cc/src/DeepSpinPT.cc	Integrates send list filtering into DeepSpin PyTorch compute function
source/lmp/tests/test_lammps_dpa_pt.py	Adds test case for NULL type handling in LAMMPS pair coefficients

source/api_cc/src/common.cc

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

source/api_cc/src/DeepSpinPT.cc (1)
205-213: Flatten the sendlist pointer-of-pointers into a contiguous buffer before calling from_blob

InputNlist::sendlist is declared as int ** (an array of pointers to per-swap arrays) in api_c/include/deepmd.hpp, so passing it directly to
torch::from_blob(lmp_list.sendlist, {total_send}, int32_option)
will interpret the array‐of‐pointers as a contiguous block of ints, which is invalid and will lead to incorrect data being read (or a crash). You must first flatten all per-swap subarrays into a single contiguous int buffer.

Please update the following files:

source/api_cc/src/DeepSpinPT.cc (around line 207)

source/api_cc/src/DeepPotPT.cc (around line 200)

Proposed refactoring to safely flatten (ensuring the buffer outlives the call to run_method):
-      torch::Tensor sendlist_tensor =
-          torch::from_blob(lmp_list.sendlist, {total_send}, int32_option);
+      // Flatten the int** sendlist into a single contiguous vector
+      std::vector<int> flat_sendlist;
+      flat_sendlist.reserve(total_send);
+      for (int s = 0; s < nswap; ++s) {
+        flat_sendlist.insert(
+            flat_sendlist.end(),
+            lmp_list.sendlist[s],
+            lmp_list.sendlist[s] + lmp_list.sendnum[s]);
+      }
+      // Create a tensor from the contiguous buffer
+      torch::Tensor sendlist_tensor =
+          torch::from_blob(flat_sendlist.data(), {total_send}, int32_option);
[mend_review_comment]
source/api_cc/src/DeepPotPD.cc (1)
421-439: Shape bug: send_list should carry nswap pointer addresses, not total_send elements

You reshape send_list to total_send but copy only nswap pointer addresses. This is a mismatched shape and can corrupt reads downstream.
-      int total_send =
-          std::accumulate(lmp_list.sendnum, lmp_list.sendnum + nswap, 0);
-      sendlist_tensor->Reshape({total_send});
+      // send_list carries one pointer per swap
+      sendlist_tensor->Reshape({nswap});
       /**
       ** NOTE: paddle do not support construct a Tensor with from_blob(T**, ...)
       ** from a double pointer, so we convert int* pointer to indptr_t for each
       ** entry and wrap it into int64 Tensor as a workaround.
       */
       std::vector<std::intptr_t> pointer_addresses;
-      pointer_addresses.reserve(nswap);
+      pointer_addresses.reserve(nswap);

🧹 Nitpick comments (1)

source/api_cc/include/common.h (1)
105-107: API const-correctness: this mutates inlist; avoid const on parameter

The helper writes through inlist pointers (sendnum/recvnum/sendlist). Marking the parameter const is misleading.
-void select_real_atoms_sendlist(const deepmd::InputNlist& inlist,
-                                const std::vector<int>& fwd_map);
+// In-place: trims and remaps sendlist; updates sendnum/recvnum/(firstrecv if present)
+void select_real_atoms_sendlist(deepmd::InputNlist& inlist,
+                                const std::vector<int>& fwd_map);
If changing the signature ripples too far now, at least document the in-place side effects in a comment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f99ad4d and af2411d.

📒 Files selected for processing (6)

source/api_cc/include/common.h (1 hunks)
source/api_cc/src/DeepPotPD.cc (1 hunks)
source/api_cc/src/DeepPotPT.cc (1 hunks)
source/api_cc/src/DeepSpinPT.cc (1 hunks)
source/api_cc/src/common.cc (1 hunks)
source/lmp/tests/test_lammps_dpa_pt.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/api_cc/src/DeepPotPT.cc

🧰 Additional context used

🧬 Code graph analysis (5)

source/lmp/tests/test_lammps_dpa_pt.py (1)

source/lmp/tests/test_lammps.py (1)

lammps_type_map (289-292)

source/api_cc/src/common.cc (1)

source/op/tf/prod_env_mat_multi_device.cc (1)

inlist (1590-1590)

source/api_cc/include/common.h (2)

source/api_cc/src/common.cc (2)

select_real_atoms_sendlist (235-258)

select_real_atoms_sendlist (235-236)

source/api_c/include/deepmd.hpp (4)

InputNlist (807-814)

InputNlist (815-822)

InputNlist (823-850)

InputNlist (851-851)

source/api_cc/src/DeepSpinPT.cc (1)

source/api_cc/src/common.cc (2)

select_real_atoms_sendlist (235-258)

select_real_atoms_sendlist (235-236)

source/api_cc/src/DeepPotPD.cc (1)

source/api_cc/src/common.cc (2)

select_real_atoms_sendlist (235-258)

select_real_atoms_sendlist (235-236)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (1)

source/api_cc/src/common.cc (1)
235-258: Recompute firstrecv after trimming and verify symmetry

You need to keep the receive‐offset array in sync whenever you shrink recvnum, and also confirm that the assumption recvnum[s] == sendnum[s] holds across all ranks/types.

• After your loop that sets
inlist.sendnum[s] = cnt;
inlist.recvnum[s] = cnt;
add:
+  // Recompute receive‐offsets to match updated recvnum[]
+  if (inlist.firstrecv) {
+    int offset = 0;
+    for (int s = 0; s < nswap; ++s) {
+      inlist.firstrecv[s] = offset;
+      offset += inlist.recvnum[s];
+    }
+  }
This keeps the firstrecv prefix sums correct ( firstrecv is “where to put 1st recv atom in each swap” ) (docs.deepmodeling.com).

• Because you’re setting recvnum[s] = sendnum[s], you’re assuming perfect send/recv symmetry per swap. Please verify this holds under all type‐map filters and MPI ranks. Consider adding a debug‐only assertion or a collective check (e.g. an MPI_Allreduce on the arrays) to catch any imbalance early.

source/api_cc/src/DeepPotPD.cc

source/api_cc/src/DeepSpinPT.cc

source/lmp/tests/test_lammps_dpa_pt.py

HydrogenSulfate

LGTM for source/api_cc/src/DeepPotPD.cc

debug 4749

f99ad4d

Replaces usage of lmp_list send/recv arrays with new vectors that map indices using fwd_map and synchronize counts via MPI. Updates tensor construction to use these new vectors, improving correctness and flexibility in distributed communication.

iProzd marked this pull request as draft August 14, 2025 13:26

github-actions bot added the C++ label Aug 14, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

ba8f52e

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

njzjz reviewed Aug 14, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

njzjz reviewed Aug 14, 2025

View reviewed changes

source/api_cc/src/DeepPotPT.cc Outdated Show resolved Hide resolved

iProzd and others added 2 commits August 15, 2025 00:30

fix cmake

aa8b3a8

[pre-commit.ci] auto fixes from pre-commit.com hooks

4dab5d8

for more information, see https://pre-commit.ci

caic99 linked an issue Aug 22, 2025 that may be closed by this pull request

NULL atom type can not be used with deepmd 3.1.0a0 pth #4749

Open

iProzd and others added 9 commits August 26, 2025 17:22

fix bugs

4a2027e

Update DeepPotPT.cc

a4f565f

[pre-commit.ci] auto fixes from pre-commit.com hooks

58e340c

for more information, see https://pre-commit.ci

Update DeepPotPT.cc

e0626dd

wrapped a func

73ad268

fix const

7167591

Merge branch 'devel' into D0814_debug_c_nlist

619b158

[pre-commit.ci] auto fixes from pre-commit.com hooks

e8b889e

for more information, see https://pre-commit.ci

Update test_lammps_dpa_pt.py

2114514

github-actions bot added the LAMMPS label Aug 27, 2025

iProzd marked this pull request as ready for review August 27, 2025 10:25

iProzd requested review from CaRoLZhangxy, HydrogenSulfate, caic99 and njzjz August 27, 2025 10:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

af2411d

for more information, see https://pre-commit.ci

caic99 requested a review from Copilot August 27, 2025 10:28

Copilot AI reviewed Aug 27, 2025

View reviewed changes

source/api_cc/src/common.cc Show resolved Hide resolved

source/api_cc/src/common.cc Show resolved Hide resolved

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

source/api_cc/src/DeepPotPD.cc Show resolved Hide resolved

source/api_cc/src/DeepSpinPT.cc Show resolved Hide resolved

source/lmp/tests/test_lammps_dpa_pt.py Show resolved Hide resolved

njzjz approved these changes Aug 27, 2025

View reviewed changes

HydrogenSulfate approved these changes Aug 27, 2025

View reviewed changes

caic99 approved these changes Aug 28, 2025

View reviewed changes

fix(c++): fix NULL type in custom op #4889

Are you sure you want to change the base?

fix(c++): fix NULL type in custom op #4889

Uh oh!

Conversation

iProzd commented Aug 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HydrogenSulfate left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iProzd commented Aug 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)