-
Notifications
You must be signed in to change notification settings - Fork 585
feat: add device name display (for example: A100 not just cuda) #5146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds an abstract SummaryPrinter.get_device_name API in Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant SummaryPrinter
participant BackendImpl
participant RuntimeAPI
User->>SummaryPrinter: request summary
SummaryPrinter->>BackendImpl: get_device_name()
BackendImpl->>RuntimeAPI: query GPU device info (CUDA / TF / Paddle API)
RuntimeAPI-->>BackendImpl: device name or no-device
BackendImpl-->>SummaryPrinter: device name or None
SummaryPrinter->>SummaryPrinter: add "Device Name" to build_info if present
SummaryPrinter-->>User: return assembled summary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @deepmd/utils/summary.py:
- Around line 77-78: The JAX backend lacks an override of get_backend_info(), so
build_info lacks the "Backend" key and device detection is skipped; implement
get_backend_info() in the JAX backend (the JAX backend class/module that
currently inherits the base implementation) to return a dict including at least
"Backend": "JAX" plus the same keys used by other backends (e.g., device
name/count info) so that the call backend = build_info.get("Backend") in
summary.py finds "JAX" and device_name detection proceeds normally. Ensure the
new get_backend_info() signature matches the base class and populates the same
keys used by TensorFlow/PyTorch/Paddle implementations.
- Around line 77-101: The device detection block that populates
build_info["Device Name"] returns device identifiers instead of GPU model names
and hard-codes PyTorch to device 0; update the logic in the try block that
inspects backend and device_name (the code handling "PyTorch", "TensorFlow",
"Paddle") to retrieve actual GPU model names: for PyTorch call
torch.cuda.current_device() to get the active device index and then
torch.cuda.get_device_name(device_index); for TensorFlow use
tf.config.list_physical_devices("GPU") and then
tf.config.experimental.get_device_details(gpus[0]).get("device_name") (falling
back to gpus[0].name if not available); for Paddle use
paddle.device.cuda.get_device_name() (fall back to paddle.get_device() if
needed); keep the outer try/except behavior and only set build_info["Device
Name"] when a non-empty model name is obtained.
🧹 Nitpick comments (2)
deepmd/utils/summary.py (2)
77-101: Avoidexcept Exception: pass; at least debug-log and narrow exceptions.
This currently hides import/config/runtime issues completely and triggers ruff BLE001/S110.Proposed change (narrow + debug)
- except Exception: - # Best-effort device name detection; ignore failures silently - pass + except (ImportError, AttributeError, RuntimeError, ValueError) as e: + # Best-effort device name detection; ignore failures, but leave breadcrumbs for debugging. + log.debug("Device name detection failed (backend=%r): %s", backend, e)
100-101: Key naming consistency: “Device Name” vs existing lowercase keys.
If consumers parse this output, consider using a consistent key style (e.g.,device name) or reusing/augmentingcomputing device.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/summary.py
🧰 Additional context used
🪛 Ruff (0.14.10)
deepmd/utils/summary.py
97-99: try-except-pass detected, consider logging the exception
(S110)
97-97: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds functionality to display the actual GPU hardware device name (e.g., "A100") in addition to the generic device type (e.g., "cuda") in the system summary output. The implementation adds device name detection logic for PyTorch, TensorFlow, and Paddle backends.
Changes:
- Added device name detection logic for three ML framework backends (PyTorch, TensorFlow, Paddle)
- Integrated device name into the build_info dictionary for display in system summary
- Added error handling to silently ignore failures in device name detection
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5146 +/- ##
==========================================
- Coverage 81.96% 81.94% -0.02%
==========================================
Files 713 713
Lines 73029 73008 -21
Branches 3617 3616 -1
==========================================
- Hits 59855 59826 -29
- Misses 12012 12021 +9
+ Partials 1162 1161 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @deepmd/utils/summary.py:
- Around line 77-80: Wrap the call to self.get_device_name() in a try/except
(catch Exception) in the base printer so a backend error doesn't abort summary
printing; on exception, omit or set a safe placeholder for build_info["Device
Name"] and emit a non-fatal warning (e.g., via self.logger.warning or
warnings.warn) so execution continues and subsequent calls like
self.is_built_with_cuda() still run.
🧹 Nitpick comments (2)
deepmd/pt/entrypoints/main.py (1)
255-265: Make device-name detection best-effort (avoid crashing training on CUDA init/driver edge cases).Since
SummaryPrinter()()runs early (and can run in environments with partial CUDA availability), wrap the CUDA queries intry/exceptand returnNoneon failure.Proposed diff
def get_device_name(self) -> str | None: @@ - if torch.cuda.is_available(): - return torch.cuda.get_device_name(torch.cuda.current_device()) + if torch.cuda.is_available(): + try: + return torch.cuda.get_device_name(torch.cuda.current_device()) + except Exception: + return None return Nonedeepmd/pd/entrypoints/main.py (1)
227-242: Avoid hard-coding GPU 0; make Paddle device-name lookup best-effort.At least derive the device index from
DEVICE(or Paddle’s “current device” API, if available in your supported Paddle versions) and wrap intry/except.Proposed diff
def get_device_name(self) -> str | None: @@ - if paddle.device.is_compiled_with_cuda(): - cuda_mod = getattr(paddle.device, "cuda", None) - if cuda_mod is not None and cuda_mod.device_count() > 0: - get_props = getattr(cuda_mod, "get_device_properties", None) - if callable(get_props): - props = get_props(0) - return getattr(props, "name", None) + if paddle.device.is_compiled_with_cuda(): + try: + cuda_mod = getattr(paddle.device, "cuda", None) + if cuda_mod is not None and cuda_mod.device_count() > 0: + get_props = getattr(cuda_mod, "get_device_properties", None) + if callable(get_props): + dev = str(DEVICE) + dev_id = int(dev.split(":", 1)[1]) if dev.startswith("gpu:") else 0 + props = get_props(dev_id) + return getattr(props, "name", None) + except Exception: + return None return None
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pd/entrypoints/main.pydeepmd/pt/entrypoints/main.pydeepmd/tf/train/run_options.pydeepmd/utils/summary.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-11T09:14:18.878Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5146
File: deepmd/utils/summary.py:77-78
Timestamp: 2026-01-11T09:14:18.878Z
Learning: JAX backend does not currently support training in the DeepMD-kit project, so features like device name display can be deferred for JAX.
Applied to files:
deepmd/utils/summary.py
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: JAX backend is experimental and may have limitations
Applied to files:
deepmd/utils/summary.py
📚 Learning: 2024-10-30T20:08:12.531Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4284
File: deepmd/jax/__init__.py:8-8
Timestamp: 2024-10-30T20:08:12.531Z
Learning: In the DeepMD project, entry points like `deepmd.jax` may be registered in external projects, so their absence in the local configuration files is acceptable.
Applied to files:
deepmd/utils/summary.py
🧬 Code graph analysis (3)
deepmd/pd/entrypoints/main.py (3)
deepmd/pt/entrypoints/main.py (1)
get_device_name(255-265)deepmd/tf/train/run_options.py (1)
get_device_name(76-89)deepmd/utils/summary.py (1)
get_device_name(133-141)
deepmd/pt/entrypoints/main.py (3)
deepmd/pd/entrypoints/main.py (1)
get_device_name(227-242)deepmd/tf/train/run_options.py (1)
get_device_name(76-89)deepmd/utils/summary.py (1)
get_device_name(133-141)
deepmd/tf/train/run_options.py (3)
deepmd/pd/entrypoints/main.py (1)
get_device_name(227-242)deepmd/pt/entrypoints/main.py (1)
get_device_name(255-265)deepmd/utils/summary.py (1)
get_device_name(133-141)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: Test Python (10, 3.10)
- GitHub Check: Test Python (11, 3.13)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (7, 3.13)
- GitHub Check: Test Python (12, 3.13)
- GitHub Check: Test Python (8, 3.10)
- GitHub Check: Test Python (10, 3.13)
- GitHub Check: Test Python (12, 3.10)
- GitHub Check: Test Python (11, 3.10)
- GitHub Check: Test Python (9, 3.13)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (7, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (1)
deepmd/utils/summary.py (1)
133-141: Nice addition: defaultget_device_name()makes the feature opt-in per backend.No concerns with the default implementation returning
None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @deepmd/tf/train/run_options.py:
- Around line 76-93: Update get_device_name to only return the human-readable
device name and not fall back to the device identifier: inside
get_device_name(), when tf.config.get_visible_devices("GPU") returns a list,
call tf.config.experimental.get_device_details(gpus[0]) and return
details.get("device_name") (no "or gpus[0].name" fallback); keep the existing
exception handling and final return None so that if no human-readable name is
available the method returns None.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/tf/train/run_options.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/utils/summary.py (1)
77-79: Minor naming inconsistency: consider using lowercase key.The new key
"Device Name"uses Title Case, while other keys inbuild_infouse lowercase (e.g.,"running on","computing device","world size"). For consistency, consider changing to"device name".Suggested fix
device_name = self.get_device_name() if device_name: - build_info["Device Name"] = device_name + build_info["device name"] = device_name
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/utils/summary.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-11T09:14:21.341Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5146
File: deepmd/utils/summary.py:77-78
Timestamp: 2026-01-11T09:14:21.341Z
Learning: JAX backend does not currently support training in the DeepMD-kit project, so features like device name display can be deferred for JAX.
Applied to files:
deepmd/utils/summary.py
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: JAX backend is experimental and may have limitations
Applied to files:
deepmd/utils/summary.py
📚 Learning: 2024-10-30T20:08:12.531Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4284
File: deepmd/jax/__init__.py:8-8
Timestamp: 2024-10-30T20:08:12.531Z
Learning: In the DeepMD project, entry points like `deepmd.jax` may be registered in external projects, so their absence in the local configuration files is acceptable.
Applied to files:
deepmd/utils/summary.py
🧬 Code graph analysis (1)
deepmd/utils/summary.py (3)
deepmd/tf/train/run_options.py (1)
get_device_name(76-92)deepmd/pt/entrypoints/main.py (1)
get_device_name(255-265)deepmd/pd/entrypoints/main.py (1)
get_device_name(227-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Analyze (python)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
🔇 Additional comments (1)
deepmd/utils/summary.py (1)
129-131: LGTM!The abstract method is well-defined with appropriate return type annotation. The docstring follows the existing brief style of other abstract methods in this class. Based on learnings, JAX backend support can be deferred since it doesn't currently support training.
…ot just cuda)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.