Skip to content

Conversation

@ofirfarjun7
Copy link
Contributor

@ofirfarjun7 ofirfarjun7 commented Nov 10, 2025

What?

Move nvidia peermem driver check outside the gpu loop.

Why?

Improve code.

Summary by CodeRabbit

  • Bug Fixes

    • Removed duplicate diagnostic messages during device initialization, reducing noisy logs and improving error clarity.
    • Added a clear early-fail behavior when peer-memory support is absent, preventing repeated failed device attempts.
  • Performance

    • Cached device state during queries to avoid redundant checks and speed up initialization.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

Added a one-time global cached peermem_loaded check in uct_gdaki_query_tl_devices, emitting the diagnostic once and returning early with UCS_ERR_NO_DEVICE if unsupported; removed duplicate per-GPU peermem checks and unified function exit via an out label.

Changes

Cohort / File(s) Summary
GDAKI device query optimization
src/uct/ib/mlx5/gdaki/gdaki.c
Introduced a static global peermem_loaded cache computed on first call; emit diagnostic once if NVIDIA peermem absent; early-return with UCS_ERR_NO_DEVICE when unset; removed redundant per-GPU peermem checks; unified cleanup/return via out label.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant uct_gdaki_query_tl_devices as GDAKI
    opt First-call: compute static peermem_loaded
        Caller->>GDAKI: invoke
        GDAKI-->>GDAKI: derive peermem_loaded from md->super.reg_mem_types & CUDA type
        GDAKI-->>Caller: (diagnostic if unsupported)
    end
    Caller->>GDAKI: invoke
    alt peermem_loaded == 0
        GDAKI--xCaller: return UCS_ERR_NO_DEVICE
    else peermem_loaded == 1
        GDAKI->>GDAKI: iterate GPUs / MDs (per-GPU processing)
        GDAKI-->>Caller: return status (success or other)
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Review focus:
    • Ensure static peermem_loaded initializes exactly once and is thread-safe if applicable.
    • Confirm UCS_ERR_NO_DEVICE early return is acceptable for all callers.
    • Verify no required per-GPU nuance was removed with the duplicated check deletion.

Possibly related PRs

Suggested reviewers

  • Artemy-Mellanox
  • yosefe
  • brminich

Poem

🐰 A cached check hops in once and stays,
No echoes of warnings through looping days.
One tidy exit, one single chime—
Less noisy code, more time to thyme. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely describes the main change: moving the peermem check outside the loop to avoid redundant checks per GPU.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c70512 and 9a3058d.

📒 Files selected for processing (1)
  • src/uct/ib/mlx5/gdaki/gdaki.c (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/uct/ib/mlx5/gdaki/gdaki.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: UCX PR (Static_check Static checks)
  • GitHub Check: UCX PR (Codestyle commit title)
  • GitHub Check: UCX PR (Codestyle ctags check)
  • GitHub Check: UCX PR (Codestyle AUTHORS file update check)
  • GitHub Check: UCX PR (Codestyle format code)
  • GitHub Check: UCX PR (Codestyle codespell check)
  • GitHub Check: UCX release DRP (Prepare CheckRelease)
  • GitHub Check: UCX release (Prepare CheckRelease)
  • GitHub Check: UCX snapshot (Prepare Check)

Comment @coderabbitai help to get the list of available commands and usage tips.

@ofirfarjun7 ofirfarjun7 force-pushed the topic/check-nvidia-peermem-v2 branch from 4c70512 to 9a3058d Compare November 10, 2025 14:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/uct/ib/mlx5/gdaki/gdaki.c (1)

700-700: Optional: Remove trailing whitespace.

Minor formatting nit: Line 700 appears to have trailing whitespace after the out: label.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14219b1 and 4c70512.

📒 Files selected for processing (1)
  • src/uct/ib/mlx5/gdaki/gdaki.c (2 hunks)
🔇 Additional comments (2)
src/uct/ib/mlx5/gdaki/gdaki.c (2)

618-630: LGTM! Peermem check correctly moved outside the loop.

The lazy initialization pattern using a static variable is correct and achieves the stated goal of checking peermem support once rather than repeatedly inside the GPU loop. The diagnostic message will now only be printed once on first invocation when peermem is not loaded.

Note: The static variable lacks explicit synchronization, which could result in multiple concurrent threads initializing peermem_loaded simultaneously. However, this race is benign (multiple assignments of the same value, potentially duplicate diagnostics), and the pattern is consistent with the existing uar_supported variable.


632-635: Good optimization: early return prevents unnecessary work.

The early return when peermem is not loaded is correct and efficient. Since peermem is required for any GPU to be usable with this transport, returning immediately avoids the overhead of GPU enumeration and device allocation when support is unavailable.

@ofirfarjun7 ofirfarjun7 requested a review from brminich November 10, 2025 14:21
@brminich brminich merged commit b13cf9b into openucx:master Nov 14, 2025
148 checks passed
zzhang37 pushed a commit to intel-staging/ucx that referenced this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants