feat(pt): Add support for SiLU activation function in gradient calculations #5055

OutisLi · 2025-11-15T08:18:37Z

Introduced the SiLU (Sigmoid Linear Unit) activation function with corresponding gradient and second derivative calculations.
Updated the activation function mapping to include SiLU, enhancing the flexibility of activation functions available in the DPTabulate class.

Summary by CodeRabbit

New Features
- Added support for the SiLU (Swish) activation across runtime computations and gradients.
Tests
- Expanded test coverage to validate all supported activations (tanh, gelu, relu, relu6, softplus, sigmoid, silu) and their first/second derivatives across execution paths.
Documentation
- Updated activation references and lists to include SiLU among supported options.

- Introduced the SiLU (Sigmoid Linear Unit) activation function with corresponding gradient and second derivative calculations. - Updated the activation function mapping to include SiLU, enhancing the flexibility of activation functions available in the DPTabulate class.

coderabbitai · 2025-11-15T08:22:13Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds SiLU (swish) activation support as functype 7 across Python tabulation utilities, C++ unaggregated gradient code, and PyTorch tests; implements SiLU first- and second-derivative branches, updates activation mapping, and parameterizes tests to exercise activations 1–7.

Changes

Cohort / File(s)	Change Summary
Tabulation Python utilities `deepmd/pt/utils/tabulate.py`	Added "silu" to the activation map as `functype = 7`; implemented grad and grad_grad branches for SiLU (first derivative: sigmoid(x) * (1 + x * (1 - sigmoid(x))); second derivative: expression using d_sig and sigmoid); updated docstring.
C++ gradient implementation `source/op/tf/unaggregated_grad.cc`	Added `case 7` in both `grad` and `grad_grad` to compute SiLU-related gradient expressions using sigmoid and its derivative; no changes to other cases or public signatures.
Tests (PyTorch) `source/tests/pt/test_tabulate.py`	Added `ACTIVATION_NAMES` mapping and `get_activation_function(functype)`; restructured tests to parameterize over activations 1–7 (including SiLU), compute per-activation `y`, and validate unaggregated derivative functions for each functype with clearer messages.

Sequence Diagram(s)

sequenceDiagram
  participant T as Tests
  participant Py as Python tabulate
  participant C as C++ unaggregated_grad

  Note over T,Py: Iterate functype 1..7
  T->>Py: activation = get_activation_function(functype)
  T->>Py: y = activation(x)
  T->>Py: request derivatives (unaggregated_dy_dx[_s], unaggregated_dy2_dx[_s]) with functype
  Py->>C: dispatch grad/grad_grad using functype
  alt functype == 7
    C-->>Py: compute SiLU grad & grad_grad (sigmoid-based formulas)
  else
    C-->>Py: compute existing activation derivatives
  end
  Py-->>T: return derivative tensors for assertions

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify mathematical correctness and consistency of SiLU first- and second-derivative formulas in both Python and C++.
Confirm functype encoding and dispatch alignment between tests, Python, and C++.
Check tests for numerical edge cases (large/small x) and device/dtype consistency.

Suggested reviewers

njzjz
caic99
wanghan-iapcm

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and specifically describes the main change: adding SiLU activation function support to gradient calculations. It directly corresponds to the primary objective of the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/pt/utils/tabulate.py (2)
48-49: Align activation docs and ActivationFn string with new SiLU support

You map "silu": 7 correctly into activation_map, but the class docstring still claims only {"tanh","gelu"} are supported, and it doesn’t mention SiLU or the other already-supported activations. Also, please double‑check that ActivationFn elsewhere in the codebase accepts "silu" as the activation string (same spelling/casing) so this mapping is reachable.

Consider updating the docstring to defer to ActivationFn and include SiLU in the examples, e.g.:
-    activation_function
-            The activation function in the embedding net. Supported options are {"tanh","gelu"} in common.ActivationFn.
+    activation_function
+            The activation function in the embedding net. See :class:`ActivationFn`
+            for supported options (e.g. "tanh", "gelu", "relu", "silu").
Also applies to: 78-88

445-509: Add targeted tests for SiLU gradients and Hessians

Now that functype == 7 is wired into grad/grad_grad, it would be good to add tests that:

Compare grad(xbar, y, 7) and grad_grad(xbar, y, 7) against PyTorch autograd on torch.nn.SiLU over a range of xbar values.

Exercise both CPU and GPU (if supported in your CI).

This will guard against future regressions in the hand‑coded formulas.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c346332 and dde7188.

📒 Files selected for processing (1)

deepmd/pt/utils/tabulate.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

deepmd/pt/utils/tabulate.py (1)

deepmd/pt/utils/utils.py (1)

sigmoid (154-155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: CodeQL analysis (python)
GitHub Check: Agent
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)

🔇 Additional comments (2)

deepmd/pt/utils/tabulate.py (2)

472-476: SiLU first derivative implementation looks correct

The functype == 7 branch implements

sig = torch.sigmoid(xbar)

sig + xbar * sig * (1 - sig)

which matches ( \frac{d}{dx} [x \cdot \sigma(x)] = \sigma(x) + x \sigma(x)(1 - \sigma(x)) ). Using a single torch.sigmoid call and reusing sig is efficient and consistent with the other branches.

504-508: SiLU second derivative implementation is mathematically consistent

Here you compute:

sig = torch.sigmoid(xbar)

d_sig = sig * (1 - sig)

2 * d_sig + xbar * d_sig * (1 - 2 * sig)

which matches the analytically derived ( f''(x) = 2,d_\sigma + x,d_\sigma(1 - 2\sigma(x)) ) for ( f(x) = x \sigma(x) ). This integrates cleanly with the existing grad_grad interface.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deepmd/pt/utils/tabulate.py

codecov · 2025-11-15T09:14:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.25%. Comparing base (c346332) to head (61c3b9e).
⚠️ Report is 1 commits behind head on devel.

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #5055      +/-   ##
==========================================
+ Coverage   84.18%   84.25%   +0.07%     
==========================================
  Files         709      709              
  Lines       70220    70234      +14     
  Branches     3619     3620       +1     
==========================================
+ Hits        59116    59177      +61     
+ Misses       9936     9889      -47     
  Partials     1168     1168

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Updated the DPTabulate class documentation to include additional activation functions ("relu", "silu") in the embedding net. - Added a new activation function (case 7: SiLU) and its gradient calculations in the unaggregated gradient functions. - Implemented comprehensive tests for all activation functions, ensuring correct behavior across various scenarios in the test suite.

source/tests/pt/test_tabulate.py

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

source/tests/pt/test_tabulate.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/pt/utils/tabulate.py (1)

49-50: Docstring now lists supported activations but parameter name is slightly inconsistent

The updated docstring correctly points to ActivationFn and lists "silu" among supported options. For clarity, consider aligning the documented parameter name (activation_function) with the actual __init__ argument name (activation_fn).

source/tests/pt/test_tabulate.py (1)

22-44: Activation helper and name map cover all functypes, including SiLU

get_activation_function and ACTIVATION_NAMES cleanly cover functypes 1–7, with the SiLU case implemented as x / (1 + exp(-x)), which matches the intended x * sigmoid(x) behavior used elsewhere. This keeps the tests aligned with the activation map in DPTabulate and the new SiLU gradients. The ValueError on unknown functype is also fine here; the Ruff TRY003 hint is purely stylistic and can be ignored or suppressed if it’s noisy.

Also applies to: 46-54

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde7188 and 620d15d.

📒 Files selected for processing (3)

deepmd/pt/utils/tabulate.py (4 hunks)
source/op/tf/unaggregated_grad.cc (2 hunks)
source/tests/pt/test_tabulate.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

source/tests/pt/test_tabulate.py (1)

deepmd/pt/utils/tabulate.py (4)

unaggregated_dy_dx_s (515-534)

unaggregated_dy2_dx_s (537-563)

unaggregated_dy_dx (566-600)

unaggregated_dy2_dx (603-645)

🪛 Ruff (0.14.4)

source/tests/pt/test_tabulate.py

43-43: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: CodeQL analysis (python)
GitHub Check: Agent
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Test C++ (false)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test C++ (true)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64

🔇 Additional comments (3)

source/op/tf/unaggregated_grad.cc (1)

78-81: SiLU gradient and second derivative implementation look correct and consistent

The new case 7 branches in grad and grad_grad implement the first and second derivatives of SiLU f(x) = x * sigmoid(x) as σ + x σ (1-σ) and 2 σ' + x σ' (1-2 σ), respectively, which are mathematically correct and consistent with the Python tabulate implementations and the new tests.

Also applies to: 112-116

deepmd/pt/utils/tabulate.py (1)

80-89: SiLU activation mapping and derivatives are coherent across the stack

Adding "silu": 7 to activation_map and the corresponding functype == 7 branches in grad and grad_grad provides a consistent SiLU implementation: grad returns σ(x) + x σ(x) (1 - σ(x)) and grad_grad returns 2 σ'(x) + x σ'(x) (1 - 2 σ(x)), matching the analytic derivatives of f(x)=x*sigmoid(x) and the C++ UnaggregatedDyDx* implementations and tests.

Also applies to: 473-477, 505-509

source/tests/pt/test_tabulate.py (1)

82-89: Parameterized TF/PT parity tests across all activations are well structured

The refactored test_ops with _test_single_activation systematically exercises all four unaggregated ops (*_dy_dx_s, *_dy2_dx_s, *_dy_dx, *_dy2_dx) for functypes 1–7, comparing TF kernels against the PT utilities with clear per-activation subTest labels. Argument ordering and shapes match the C++ op signatures and Python wrappers, so this should give good coverage for the new SiLU path without altering existing behavior.

Also applies to: 90-203

source/tests/pt/test_tabulate.py

… but wrapped to use num index instead of names

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

source/tests/pt/test_tabulate.py (1)
35-40: LGTM! Clean helper function.

The get_activation_function helper provides a clear interface for looking up activation functions by functype with appropriate error handling.

Optional: The static analysis hint (TRY003) suggests defining the error message within a custom exception class for better exception handling practices, though this is a minor style preference:
-        raise ValueError(f"Unknown functype: {functype}")
+        raise ValueError(f"Unknown functype: {functype}")  # noqa: TRY003
or define a custom exception if this pattern is common across the codebase.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 620d15d and 61c3b9e.

📒 Files selected for processing (1)

source/tests/pt/test_tabulate.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

source/tests/pt/test_tabulate.py (2)

deepmd/dpmodel/utils/network.py (1)

get_activation_fn (300-399)

deepmd/pt/utils/tabulate.py (4)

unaggregated_dy_dx_s (515-534)

unaggregated_dy2_dx_s (537-563)

unaggregated_dy_dx (566-600)

unaggregated_dy2_dx (603-645)

🪛 Ruff (0.14.4)

source/tests/pt/test_tabulate.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (3, 3.12)
GitHub Check: Analyze (python)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test C++ (true)
GitHub Check: Test C++ (false)

🔇 Additional comments (4)

source/tests/pt/test_tabulate.py (4)

7-9: LGTM! Good reuse of existing implementation.

The import of get_activation_fn correctly leverages the existing activation function implementation from the codebase, addressing the previous review concern about using the existing implementation.

24-32: LGTM! Clear activation mapping.

The ACTIVATION_NAMES constant provides a clear, maintainable mapping of functypes to activation names, including the newly added SiLU (functype 7).

67-74: LGTM! Excellent parameterized testing approach.

The refactoring to use subTest for each activation function is a best practice that provides:

Clear test isolation per activation

Easy identification of failures by activation name and functype

Maintainable test structure

76-189: Test structure is sound. TensorFlow ops support all functypes 1-7.

Verification confirms that source/op/tf/unaggregated_grad.cc implements all required functype cases (1-7) in both gradient computation functions, including the newly added functype 7 (SiLU). The test logic is well-structured with proper device handling and will execute correctly.

Copilot AI review requested due to automatic review settings November 15, 2025 08:18

github-actions bot added the Python label Nov 15, 2025

Copilot started reviewing on behalf of OutisLi November 15, 2025 08:19 View session

OutisLi changed the title ~~feat: Add support for SiLU activation function in gradient calculations~~ feat(pt): Add support for SiLU activation function in gradient calculations Nov 15, 2025

Copilot finished reviewing on behalf of OutisLi November 15, 2025 08:20

coderabbitai bot reviewed Nov 15, 2025

View reviewed changes

Copilot AI reviewed Nov 15, 2025

View reviewed changes

OutisLi requested a review from Copilot November 15, 2025 09:07

Copilot started reviewing on behalf of OutisLi November 15, 2025 09:07 View session

Copilot finished reviewing on behalf of OutisLi November 15, 2025 09:10

Copilot AI reviewed Nov 15, 2025

View reviewed changes

deepmd/pt/utils/tabulate.py Show resolved Hide resolved

OutisLi requested review from iProzd and njzjz November 15, 2025 09:37

github-actions bot added the OP label Nov 15, 2025

OutisLi requested a review from Copilot November 15, 2025 10:07

Copilot started reviewing on behalf of OutisLi November 15, 2025 10:08 View session

github-advanced-security bot found potential problems Nov 15, 2025

View reviewed changes

source/tests/pt/test_tabulate.py Fixed Show fixed Hide fixed

Copilot finished reviewing on behalf of OutisLi November 15, 2025 10:10

Copilot AI reviewed Nov 15, 2025

View reviewed changes

source/tests/pt/test_tabulate.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 15, 2025

View reviewed changes

njzjz reviewed Nov 16, 2025

View reviewed changes

source/tests/pt/test_tabulate.py Outdated Show resolved Hide resolved

refactor: use get_activation_fn from deepmd.dpmodel.utils.network…

61c3b9e

… but wrapped to use num index instead of names

coderabbitai bot reviewed Nov 16, 2025

View reviewed changes

OutisLi requested a review from njzjz November 16, 2025 09:27

njzjz approved these changes Nov 16, 2025

View reviewed changes

iProzd approved these changes Nov 17, 2025

View reviewed changes

njzjz added this pull request to the merge queue Nov 17, 2025

Merged via the queue into deepmodeling:devel with commit d4e9ffc Nov 17, 2025
60 checks passed

feat(pt): Add support for SiLU activation function in gradient calculations #5055

feat(pt): Add support for SiLU activation function in gradient calculations #5055

Conversation

OutisLi commented Nov 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

codecov bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OutisLi commented Nov 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 15, 2025 •

edited

Loading

codecov bot commented Nov 15, 2025 •

edited

Loading