Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Nov 15, 2025

  • Introduced the SiLU (Sigmoid Linear Unit) activation function with corresponding gradient and second derivative calculations.
  • Updated the activation function mapping to include SiLU, enhancing the flexibility of activation functions available in the DPTabulate class.

Summary by CodeRabbit

  • New Features
    • Added support for the SiLU (Swish) activation across runtime computations and gradients.
  • Tests
    • Expanded test coverage to validate all supported activations (tanh, gelu, relu, relu6, softplus, sigmoid, silu) and their first/second derivatives across execution paths.
  • Documentation
    • Updated activation references and lists to include SiLU among supported options.

- Introduced the SiLU (Sigmoid Linear Unit) activation function with corresponding gradient and second derivative calculations.
- Updated the activation function mapping to include SiLU, enhancing the flexibility of activation functions available in the DPTabulate class.
Copilot AI review requested due to automatic review settings November 15, 2025 08:18
@OutisLi OutisLi changed the title feat: Add support for SiLU activation function in gradient calculations feat(pt): Add support for SiLU activation function in gradient calculations Nov 15, 2025
Copilot finished reviewing on behalf of OutisLi November 15, 2025 08:20
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds SiLU (swish) activation support as functype 7 across Python tabulation utilities, C++ unaggregated gradient code, and PyTorch tests; implements SiLU first- and second-derivative branches, updates activation mapping, and parameterizes tests to exercise activations 1–7.

Changes

Cohort / File(s) Change Summary
Tabulation Python utilities
deepmd/pt/utils/tabulate.py
Added "silu" to the activation map as functype = 7; implemented grad and grad_grad branches for SiLU (first derivative: sigmoid(x) * (1 + x * (1 - sigmoid(x))); second derivative: expression using d_sig and sigmoid); updated docstring.
C++ gradient implementation
source/op/tf/unaggregated_grad.cc
Added case 7 in both grad and grad_grad to compute SiLU-related gradient expressions using sigmoid and its derivative; no changes to other cases or public signatures.
Tests (PyTorch)
source/tests/pt/test_tabulate.py
Added ACTIVATION_NAMES mapping and get_activation_function(functype); restructured tests to parameterize over activations 1–7 (including SiLU), compute per-activation y, and validate unaggregated derivative functions for each functype with clearer messages.

Sequence Diagram(s)

sequenceDiagram
  participant T as Tests
  participant Py as Python tabulate
  participant C as C++ unaggregated_grad

  Note over T,Py: Iterate functype 1..7
  T->>Py: activation = get_activation_function(functype)
  T->>Py: y = activation(x)
  T->>Py: request derivatives (unaggregated_dy_dx[_s], unaggregated_dy2_dx[_s]) with functype
  Py->>C: dispatch grad/grad_grad using functype
  alt functype == 7
    C-->>Py: compute SiLU grad & grad_grad (sigmoid-based formulas)
  else
    C-->>Py: compute existing activation derivatives
  end
  Py-->>T: return derivative tensors for assertions
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify mathematical correctness and consistency of SiLU first- and second-derivative formulas in both Python and C++.
  • Confirm functype encoding and dispatch alignment between tests, Python, and C++.
  • Check tests for numerical edge cases (large/small x) and device/dtype consistency.

Suggested reviewers

  • njzjz
  • caic99
  • wanghan-iapcm

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main change: adding SiLU activation function support to gradient calculations. It directly corresponds to the primary objective of the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide custom instructions to shape the summary (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deepmd/pt/utils/tabulate.py (2)

48-49: Align activation docs and ActivationFn string with new SiLU support

You map "silu": 7 correctly into activation_map, but the class docstring still claims only {"tanh","gelu"} are supported, and it doesn’t mention SiLU or the other already-supported activations. Also, please double‑check that ActivationFn elsewhere in the codebase accepts "silu" as the activation string (same spelling/casing) so this mapping is reachable.

Consider updating the docstring to defer to ActivationFn and include SiLU in the examples, e.g.:

-    activation_function
-            The activation function in the embedding net. Supported options are {"tanh","gelu"} in common.ActivationFn.
+    activation_function
+            The activation function in the embedding net. See :class:`ActivationFn`
+            for supported options (e.g. "tanh", "gelu", "relu", "silu").

Also applies to: 78-88


445-509: Add targeted tests for SiLU gradients and Hessians

Now that functype == 7 is wired into grad/grad_grad, it would be good to add tests that:

  • Compare grad(xbar, y, 7) and grad_grad(xbar, y, 7) against PyTorch autograd on torch.nn.SiLU over a range of xbar values.
  • Exercise both CPU and GPU (if supported in your CI).

This will guard against future regressions in the hand‑coded formulas.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c346332 and dde7188.

📒 Files selected for processing (1)
  • deepmd/pt/utils/tabulate.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/pt/utils/tabulate.py (1)
deepmd/pt/utils/utils.py (1)
  • sigmoid (154-155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Agent
  • GitHub Check: Test C++ (false)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
🔇 Additional comments (2)
deepmd/pt/utils/tabulate.py (2)

472-476: SiLU first derivative implementation looks correct

The functype == 7 branch implements

  • sig = torch.sigmoid(xbar)
  • sig + xbar * sig * (1 - sig)

which matches ( \frac{d}{dx} [x \cdot \sigma(x)] = \sigma(x) + x \sigma(x)(1 - \sigma(x)) ). Using a single torch.sigmoid call and reusing sig is efficient and consistent with the other branches.


504-508: SiLU second derivative implementation is mathematically consistent

Here you compute:

  • sig = torch.sigmoid(xbar)
  • d_sig = sig * (1 - sig)
  • 2 * d_sig + xbar * d_sig * (1 - 2 * sig)

which matches the analytically derived ( f''(x) = 2,d_\sigma + x,d_\sigma(1 - 2\sigma(x)) ) for ( f(x) = x \sigma(x) ). This integrates cleanly with the existing grad_grad interface.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@OutisLi OutisLi requested a review from Copilot November 15, 2025 09:07
Copilot finished reviewing on behalf of OutisLi November 15, 2025 09:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Nov 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.25%. Comparing base (c346332) to head (61c3b9e).
⚠️ Report is 1 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #5055      +/-   ##
==========================================
+ Coverage   84.18%   84.25%   +0.07%     
==========================================
  Files         709      709              
  Lines       70220    70234      +14     
  Branches     3619     3620       +1     
==========================================
+ Hits        59116    59177      +61     
+ Misses       9936     9889      -47     
  Partials     1168     1168              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@OutisLi OutisLi requested review from iProzd and njzjz November 15, 2025 09:37
- Updated the DPTabulate class documentation to include additional activation functions ("relu", "silu") in the embedding net.
- Added a new activation function (case 7: SiLU) and its gradient calculations in the unaggregated gradient functions.
- Implemented comprehensive tests for all activation functions, ensuring correct behavior across various scenarios in the test suite.
Copilot finished reviewing on behalf of OutisLi November 15, 2025 10:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
deepmd/pt/utils/tabulate.py (1)

49-50: Docstring now lists supported activations but parameter name is slightly inconsistent

The updated docstring correctly points to ActivationFn and lists "silu" among supported options. For clarity, consider aligning the documented parameter name (activation_function) with the actual __init__ argument name (activation_fn).

source/tests/pt/test_tabulate.py (1)

22-44: Activation helper and name map cover all functypes, including SiLU

get_activation_function and ACTIVATION_NAMES cleanly cover functypes 1–7, with the SiLU case implemented as x / (1 + exp(-x)), which matches the intended x * sigmoid(x) behavior used elsewhere. This keeps the tests aligned with the activation map in DPTabulate and the new SiLU gradients. The ValueError on unknown functype is also fine here; the Ruff TRY003 hint is purely stylistic and can be ignored or suppressed if it’s noisy.

Also applies to: 46-54

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde7188 and 620d15d.

📒 Files selected for processing (3)
  • deepmd/pt/utils/tabulate.py (4 hunks)
  • source/op/tf/unaggregated_grad.cc (2 hunks)
  • source/tests/pt/test_tabulate.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_tabulate.py (1)
deepmd/pt/utils/tabulate.py (4)
  • unaggregated_dy_dx_s (515-534)
  • unaggregated_dy2_dx_s (537-563)
  • unaggregated_dy_dx (566-600)
  • unaggregated_dy2_dx (603-645)
🪛 Ruff (0.14.4)
source/tests/pt/test_tabulate.py

43-43: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: CodeQL analysis (python)
  • GitHub Check: Agent
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Test C++ (false)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
🔇 Additional comments (3)
source/op/tf/unaggregated_grad.cc (1)

78-81: SiLU gradient and second derivative implementation look correct and consistent

The new case 7 branches in grad and grad_grad implement the first and second derivatives of SiLU f(x) = x * sigmoid(x) as σ + x σ (1-σ) and 2 σ' + x σ' (1-2 σ), respectively, which are mathematically correct and consistent with the Python tabulate implementations and the new tests.

Also applies to: 112-116

deepmd/pt/utils/tabulate.py (1)

80-89: SiLU activation mapping and derivatives are coherent across the stack

Adding "silu": 7 to activation_map and the corresponding functype == 7 branches in grad and grad_grad provides a consistent SiLU implementation: grad returns σ(x) + x σ(x) (1 - σ(x)) and grad_grad returns 2 σ'(x) + x σ'(x) (1 - 2 σ(x)), matching the analytic derivatives of f(x)=x*sigmoid(x) and the C++ UnaggregatedDyDx* implementations and tests.

Also applies to: 473-477, 505-509

source/tests/pt/test_tabulate.py (1)

82-89: Parameterized TF/PT parity tests across all activations are well structured

The refactored test_ops with _test_single_activation systematically exercises all four unaggregated ops (*_dy_dx_s, *_dy2_dx_s, *_dy_dx, *_dy2_dx) for functypes 1–7, comparing TF kernels against the PT utilities with clear per-activation subTest labels. Argument ordering and shapes match the C++ op signatures and Python wrappers, so this should give good coverage for the new SiLU path without altering existing behavior.

Also applies to: 90-203

… but wrapped to use num index instead of names
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
source/tests/pt/test_tabulate.py (1)

35-40: LGTM! Clean helper function.

The get_activation_function helper provides a clear interface for looking up activation functions by functype with appropriate error handling.

Optional: The static analysis hint (TRY003) suggests defining the error message within a custom exception class for better exception handling practices, though this is a minor style preference:

-        raise ValueError(f"Unknown functype: {functype}")
+        raise ValueError(f"Unknown functype: {functype}")  # noqa: TRY003

or define a custom exception if this pattern is common across the codebase.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 620d15d and 61c3b9e.

📒 Files selected for processing (1)
  • source/tests/pt/test_tabulate.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_tabulate.py (2)
deepmd/dpmodel/utils/network.py (1)
  • get_activation_fn (300-399)
deepmd/pt/utils/tabulate.py (4)
  • unaggregated_dy_dx_s (515-534)
  • unaggregated_dy2_dx_s (537-563)
  • unaggregated_dy_dx (566-600)
  • unaggregated_dy2_dx (603-645)
🪛 Ruff (0.14.4)
source/tests/pt/test_tabulate.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Test Python (3, 3.12)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Test Python (3, 3.9)
  • GitHub Check: Test Python (5, 3.12)
  • GitHub Check: Test Python (5, 3.9)
  • GitHub Check: Test Python (6, 3.12)
  • GitHub Check: Test Python (6, 3.9)
  • GitHub Check: Test Python (2, 3.9)
  • GitHub Check: Test Python (2, 3.12)
  • GitHub Check: Test Python (4, 3.12)
  • GitHub Check: Test Python (4, 3.9)
  • GitHub Check: Test Python (1, 3.12)
  • GitHub Check: Test Python (1, 3.9)
  • GitHub Check: Test C++ (true)
  • GitHub Check: Test C++ (false)
🔇 Additional comments (4)
source/tests/pt/test_tabulate.py (4)

7-9: LGTM! Good reuse of existing implementation.

The import of get_activation_fn correctly leverages the existing activation function implementation from the codebase, addressing the previous review concern about using the existing implementation.


24-32: LGTM! Clear activation mapping.

The ACTIVATION_NAMES constant provides a clear, maintainable mapping of functypes to activation names, including the newly added SiLU (functype 7).


67-74: LGTM! Excellent parameterized testing approach.

The refactoring to use subTest for each activation function is a best practice that provides:

  • Clear test isolation per activation
  • Easy identification of failures by activation name and functype
  • Maintainable test structure

76-189: Test structure is sound. TensorFlow ops support all functypes 1-7.

Verification confirms that source/op/tf/unaggregated_grad.cc implements all required functype cases (1-7) in both gradient computation functions, including the newly added functype 7 (SiLU). The test logic is well-structured with proper device handling and will execute correctly.

@OutisLi OutisLi requested a review from njzjz November 16, 2025 09:27
@njzjz njzjz added this pull request to the merge queue Nov 17, 2025
Merged via the queue into deepmodeling:devel with commit d4e9ffc Nov 17, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants