Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Jan 11, 2026

…ot just cuda)

Summary by CodeRabbit

  • New Features
    • Automatic GPU device detection added for PyTorch, TensorFlow, and Paddle; detected GPU names are now shown in system diagnostics when available.
    • Detection is conditional and safe: if no GPU is present or detection fails, diagnostics simply omit the device name without causing errors.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 11, 2026 08:54
@dosubot dosubot bot added the new feature label Jan 11, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

📝 Walkthrough

Walkthrough

Adds an abstract SummaryPrinter.get_device_name API in deepmd/utils/summary.py; __call__ now invokes it and adds "Device Name" to build_info if present. Backend-specific implementations for PyTorch, TensorFlow, and Paddle provide GPU names when available.

Changes

Cohort / File(s) Summary
Summary utility
deepmd/utils/summary.py
__call__ now calls get_device_name() and conditionally inserts "Device Name" into build_info. Added abstract `get_device_name(self) -> str
PyTorch entrypoint
deepmd/pt/entrypoints/main.py
Added `get_device_name(self) -> str
Paddle entrypoint
deepmd/pd/entrypoints/main.py
Added `get_device_name(self) -> str
TensorFlow run options
deepmd/tf/train/run_options.py
Added `get_device_name(self) -> str

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant SummaryPrinter
    participant BackendImpl
    participant RuntimeAPI

    User->>SummaryPrinter: request summary
    SummaryPrinter->>BackendImpl: get_device_name()
    BackendImpl->>RuntimeAPI: query GPU device info (CUDA / TF / Paddle API)
    RuntimeAPI-->>BackendImpl: device name or no-device
    BackendImpl-->>SummaryPrinter: device name or None
    SummaryPrinter->>SummaryPrinter: add "Device Name" to build_info if present
    SummaryPrinter-->>User: return assembled summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding device name display capability (e.g., A100) instead of generic device labels, which matches the implementation across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @deepmd/utils/summary.py:
- Around line 77-78: The JAX backend lacks an override of get_backend_info(), so
build_info lacks the "Backend" key and device detection is skipped; implement
get_backend_info() in the JAX backend (the JAX backend class/module that
currently inherits the base implementation) to return a dict including at least
"Backend": "JAX" plus the same keys used by other backends (e.g., device
name/count info) so that the call backend = build_info.get("Backend") in
summary.py finds "JAX" and device_name detection proceeds normally. Ensure the
new get_backend_info() signature matches the base class and populates the same
keys used by TensorFlow/PyTorch/Paddle implementations.
- Around line 77-101: The device detection block that populates
build_info["Device Name"] returns device identifiers instead of GPU model names
and hard-codes PyTorch to device 0; update the logic in the try block that
inspects backend and device_name (the code handling "PyTorch", "TensorFlow",
"Paddle") to retrieve actual GPU model names: for PyTorch call
torch.cuda.current_device() to get the active device index and then
torch.cuda.get_device_name(device_index); for TensorFlow use
tf.config.list_physical_devices("GPU") and then
tf.config.experimental.get_device_details(gpus[0]).get("device_name") (falling
back to gpus[0].name if not available); for Paddle use
paddle.device.cuda.get_device_name() (fall back to paddle.get_device() if
needed); keep the outer try/except behavior and only set build_info["Device
Name"] when a non-empty model name is obtained.
🧹 Nitpick comments (2)
deepmd/utils/summary.py (2)

77-101: Avoid except Exception: pass; at least debug-log and narrow exceptions.
This currently hides import/config/runtime issues completely and triggers ruff BLE001/S110.

Proposed change (narrow + debug)
-        except Exception:
-            # Best-effort device name detection; ignore failures silently
-            pass
+        except (ImportError, AttributeError, RuntimeError, ValueError) as e:
+            # Best-effort device name detection; ignore failures, but leave breadcrumbs for debugging.
+            log.debug("Device name detection failed (backend=%r): %s", backend, e)

100-101: Key naming consistency: “Device Name” vs existing lowercase keys.
If consumers parse this output, consider using a consistent key style (e.g., device name) or reusing/augmenting computing device.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82a5f32 and de2a37f.

📒 Files selected for processing (1)
  • deepmd/utils/summary.py
🧰 Additional context used
🪛 Ruff (0.14.10)
deepmd/utils/summary.py

97-99: try-except-pass detected, consider logging the exception

(S110)


97-97: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds functionality to display the actual GPU hardware device name (e.g., "A100") in addition to the generic device type (e.g., "cuda") in the system summary output. The implementation adds device name detection logic for PyTorch, TensorFlow, and Paddle backends.

Changes:

  • Added device name detection logic for three ML framework backends (PyTorch, TensorFlow, Paddle)
  • Integrated device name into the build_info dictionary for display in system summary
  • Added error handling to silently ignore failures in device name detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 11, 2026

Codecov Report

❌ Patch coverage is 54.16667% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.94%. Comparing base (82a5f32) to head (887ab90).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pd/entrypoints/main.py 16.66% 5 Missing ⚠️
deepmd/tf/train/run_options.py 55.55% 4 Missing ⚠️
deepmd/pt/entrypoints/main.py 75.00% 1 Missing ⚠️
deepmd/utils/summary.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5146      +/-   ##
==========================================
- Coverage   81.96%   81.94%   -0.02%     
==========================================
  Files         713      713              
  Lines       73029    73008      -21     
  Branches     3617     3616       -1     
==========================================
- Hits        59855    59826      -29     
- Misses      12012    12021       +9     
+ Partials     1162     1161       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@OutisLi OutisLi requested review from iProzd and njzjz January 11, 2026 10:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @deepmd/utils/summary.py:
- Around line 77-80: Wrap the call to self.get_device_name() in a try/except
(catch Exception) in the base printer so a backend error doesn't abort summary
printing; on exception, omit or set a safe placeholder for build_info["Device
Name"] and emit a non-fatal warning (e.g., via self.logger.warning or
warnings.warn) so execution continues and subsequent calls like
self.is_built_with_cuda() still run.
🧹 Nitpick comments (2)
deepmd/pt/entrypoints/main.py (1)

255-265: Make device-name detection best-effort (avoid crashing training on CUDA init/driver edge cases).

Since SummaryPrinter()() runs early (and can run in environments with partial CUDA availability), wrap the CUDA queries in try/except and return None on failure.

Proposed diff
 def get_device_name(self) -> str | None:
@@
-        if torch.cuda.is_available():
-            return torch.cuda.get_device_name(torch.cuda.current_device())
+        if torch.cuda.is_available():
+            try:
+                return torch.cuda.get_device_name(torch.cuda.current_device())
+            except Exception:
+                return None
         return None
deepmd/pd/entrypoints/main.py (1)

227-242: Avoid hard-coding GPU 0; make Paddle device-name lookup best-effort.

At least derive the device index from DEVICE (or Paddle’s “current device” API, if available in your supported Paddle versions) and wrap in try/except.

Proposed diff
 def get_device_name(self) -> str | None:
@@
-        if paddle.device.is_compiled_with_cuda():
-            cuda_mod = getattr(paddle.device, "cuda", None)
-            if cuda_mod is not None and cuda_mod.device_count() > 0:
-                get_props = getattr(cuda_mod, "get_device_properties", None)
-                if callable(get_props):
-                    props = get_props(0)
-                    return getattr(props, "name", None)
+        if paddle.device.is_compiled_with_cuda():
+            try:
+                cuda_mod = getattr(paddle.device, "cuda", None)
+                if cuda_mod is not None and cuda_mod.device_count() > 0:
+                    get_props = getattr(cuda_mod, "get_device_properties", None)
+                    if callable(get_props):
+                        dev = str(DEVICE)
+                        dev_id = int(dev.split(":", 1)[1]) if dev.startswith("gpu:") else 0
+                        props = get_props(dev_id)
+                        return getattr(props, "name", None)
+            except Exception:
+                return None
         return None
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de2a37f and 5c63acc.

📒 Files selected for processing (4)
  • deepmd/pd/entrypoints/main.py
  • deepmd/pt/entrypoints/main.py
  • deepmd/tf/train/run_options.py
  • deepmd/utils/summary.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-11T09:14:18.878Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5146
File: deepmd/utils/summary.py:77-78
Timestamp: 2026-01-11T09:14:18.878Z
Learning: JAX backend does not currently support training in the DeepMD-kit project, so features like device name display can be deferred for JAX.

Applied to files:

  • deepmd/utils/summary.py
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: JAX backend is experimental and may have limitations

Applied to files:

  • deepmd/utils/summary.py
📚 Learning: 2024-10-30T20:08:12.531Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4284
File: deepmd/jax/__init__.py:8-8
Timestamp: 2024-10-30T20:08:12.531Z
Learning: In the DeepMD project, entry points like `deepmd.jax` may be registered in external projects, so their absence in the local configuration files is acceptable.

Applied to files:

  • deepmd/utils/summary.py
🧬 Code graph analysis (3)
deepmd/pd/entrypoints/main.py (3)
deepmd/pt/entrypoints/main.py (1)
  • get_device_name (255-265)
deepmd/tf/train/run_options.py (1)
  • get_device_name (76-89)
deepmd/utils/summary.py (1)
  • get_device_name (133-141)
deepmd/pt/entrypoints/main.py (3)
deepmd/pd/entrypoints/main.py (1)
  • get_device_name (227-242)
deepmd/tf/train/run_options.py (1)
  • get_device_name (76-89)
deepmd/utils/summary.py (1)
  • get_device_name (133-141)
deepmd/tf/train/run_options.py (3)
deepmd/pd/entrypoints/main.py (1)
  • get_device_name (227-242)
deepmd/pt/entrypoints/main.py (1)
  • get_device_name (255-265)
deepmd/utils/summary.py (1)
  • get_device_name (133-141)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: Test Python (10, 3.10)
  • GitHub Check: Test Python (11, 3.13)
  • GitHub Check: Test Python (9, 3.10)
  • GitHub Check: Test Python (7, 3.13)
  • GitHub Check: Test Python (12, 3.13)
  • GitHub Check: Test Python (8, 3.10)
  • GitHub Check: Test Python (10, 3.13)
  • GitHub Check: Test Python (12, 3.10)
  • GitHub Check: Test Python (11, 3.10)
  • GitHub Check: Test Python (9, 3.13)
  • GitHub Check: Test Python (8, 3.13)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Test Python (7, 3.10)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (5, 3.13)
  • GitHub Check: Test Python (6, 3.13)
  • GitHub Check: Test Python (2, 3.13)
  • GitHub Check: Test Python (3, 3.10)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Test Python (1, 3.13)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Test Python (2, 3.10)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
deepmd/utils/summary.py (1)

133-141: Nice addition: default get_device_name() makes the feature opt-in per backend.

No concerns with the default implementation returning None.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @deepmd/tf/train/run_options.py:
- Around line 76-93: Update get_device_name to only return the human-readable
device name and not fall back to the device identifier: inside
get_device_name(), when tf.config.get_visible_devices("GPU") returns a list,
call tf.config.experimental.get_device_details(gpus[0]) and return
details.get("device_name") (no "or gpus[0].name" fallback); keep the existing
exception handling and final return None so that if no human-readable name is
available the method returns None.
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c63acc and 17a353f.

📒 Files selected for processing (1)
  • deepmd/tf/train/run_options.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: Test Python (9, 3.10)
  • GitHub Check: Test Python (8, 3.13)
  • GitHub Check: Test Python (6, 3.13)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

@OutisLi OutisLi requested a review from njzjz January 13, 2026 06:05
@OutisLi OutisLi requested a review from iProzd January 13, 2026 07:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/utils/summary.py (1)

77-79: Minor naming inconsistency: consider using lowercase key.

The new key "Device Name" uses Title Case, while other keys in build_info use lowercase (e.g., "running on", "computing device", "world size"). For consistency, consider changing to "device name".

Suggested fix
         device_name = self.get_device_name()
         if device_name:
-            build_info["Device Name"] = device_name
+            build_info["device name"] = device_name
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7dea21 and 887ab90.

📒 Files selected for processing (1)
  • deepmd/utils/summary.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-11T09:14:21.341Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5146
File: deepmd/utils/summary.py:77-78
Timestamp: 2026-01-11T09:14:21.341Z
Learning: JAX backend does not currently support training in the DeepMD-kit project, so features like device name display can be deferred for JAX.

Applied to files:

  • deepmd/utils/summary.py
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: JAX backend is experimental and may have limitations

Applied to files:

  • deepmd/utils/summary.py
📚 Learning: 2024-10-30T20:08:12.531Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4284
File: deepmd/jax/__init__.py:8-8
Timestamp: 2024-10-30T20:08:12.531Z
Learning: In the DeepMD project, entry points like `deepmd.jax` may be registered in external projects, so their absence in the local configuration files is acceptable.

Applied to files:

  • deepmd/utils/summary.py
🧬 Code graph analysis (1)
deepmd/utils/summary.py (3)
deepmd/tf/train/run_options.py (1)
  • get_device_name (76-92)
deepmd/pt/entrypoints/main.py (1)
  • get_device_name (255-265)
deepmd/pd/entrypoints/main.py (1)
  • get_device_name (227-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (3, 3.10)
  • GitHub Check: Test Python (2, 3.10)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Analyze (python)
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test C++ (true, false, false, true)
🔇 Additional comments (1)
deepmd/utils/summary.py (1)

129-131: LGTM!

The abstract method is well-defined with appropriate return type annotation. The docstring follows the existing brief style of other abstract methods in this class. Based on learnings, JAX backend support can be deferred since it doesn't currently support training.

@OutisLi OutisLi requested a review from njzjz January 13, 2026 15:08
@njzjz njzjz added this pull request to the merge queue Jan 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 13, 2026
@njzjz njzjz added this pull request to the merge queue Jan 14, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 14, 2026
@njzjz njzjz added this pull request to the merge queue Jan 14, 2026
Merged via the queue into deepmodeling:master with commit e5baf69 Jan 14, 2026
70 checks passed
@OutisLi OutisLi deleted the pr/gpuname branch January 14, 2026 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants