UCT/GDA: Move peermem check outside the loop. #11001

ofirfarjun7 · 2025-11-10T14:18:22Z

What?

Move nvidia peermem driver check outside the gpu loop.

Why?

Improve code.

Summary by CodeRabbit

Bug Fixes
- Removed duplicate diagnostic messages during device initialization, reducing noisy logs and improving error clarity.
- Added a clear early-fail behavior when peer-memory support is absent, preventing repeated failed device attempts.
Performance
- Cached device state during queries to avoid redundant checks and speed up initialization.

coderabbitai · 2025-11-10T14:18:37Z

Walkthrough

Added a one-time global cached peermem_loaded check in uct_gdaki_query_tl_devices, emitting the diagnostic once and returning early with UCS_ERR_NO_DEVICE if unsupported; removed duplicate per-GPU peermem checks and unified function exit via an out label.

Changes

Cohort / File(s)	Summary
GDAKI device query optimization `src/uct/ib/mlx5/gdaki/gdaki.c`	Introduced a static global `peermem_loaded` cache computed on first call; emit diagnostic once if NVIDIA peermem absent; early-return with `UCS_ERR_NO_DEVICE` when unset; removed redundant per-GPU peermem checks; unified cleanup/return via `out` label.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant uct_gdaki_query_tl_devices as GDAKI
    opt First-call: compute static peermem_loaded
        Caller->>GDAKI: invoke
        GDAKI-->>GDAKI: derive peermem_loaded from md->super.reg_mem_types & CUDA type
        GDAKI-->>Caller: (diagnostic if unsupported)
    end
    Caller->>GDAKI: invoke
    alt peermem_loaded == 0
        GDAKI--xCaller: return UCS_ERR_NO_DEVICE
    else peermem_loaded == 1
        GDAKI->>GDAKI: iterate GPUs / MDs (per-GPU processing)
        GDAKI-->>Caller: return status (success or other)
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Review focus:
- Ensure static peermem_loaded initializes exactly once and is thread-safe if applicable.
- Confirm UCS_ERR_NO_DEVICE early return is acceptable for all callers.
- Verify no required per-GPU nuance was removed with the duplicated check deletion.

Possibly related PRs

UCT/GDA: Check that nvidia peermem driver is loaded. #10987: Directly related—also adds a static cached peermem_loaded check, one-time diagnostic, and early-skip behavior in uct_gdaki_query_tl_devices.

Suggested reviewers

Artemy-Mellanox
yosefe
brminich

Poem

🐰 A cached check hops in once and stays,
No echoes of warnings through looping days.
One tidy exit, one single chime—
Less noisy code, more time to thyme. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and concisely describes the main change: moving the peermem check outside the loop to avoid redundant checks per GPU.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c70512 and 9a3058d.

📒 Files selected for processing (1)

src/uct/ib/mlx5/gdaki/gdaki.c (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/uct/ib/mlx5/gdaki/gdaki.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: UCX PR (Static_check Static checks)
GitHub Check: UCX PR (Codestyle commit title)
GitHub Check: UCX PR (Codestyle ctags check)
GitHub Check: UCX PR (Codestyle AUTHORS file update check)
GitHub Check: UCX PR (Codestyle format code)
GitHub Check: UCX PR (Codestyle codespell check)
GitHub Check: UCX release DRP (Prepare CheckRelease)
GitHub Check: UCX release (Prepare CheckRelease)
GitHub Check: UCX snapshot (Prepare Check)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/uct/ib/mlx5/gdaki/gdaki.c (1)

700-700: Optional: Remove trailing whitespace.

Minor formatting nit: Line 700 appears to have trailing whitespace after the out: label.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14219b1 and 4c70512.

📒 Files selected for processing (1)

src/uct/ib/mlx5/gdaki/gdaki.c (2 hunks)

🔇 Additional comments (2)

src/uct/ib/mlx5/gdaki/gdaki.c (2)

618-630: LGTM! Peermem check correctly moved outside the loop.

The lazy initialization pattern using a static variable is correct and achieves the stated goal of checking peermem support once rather than repeatedly inside the GPU loop. The diagnostic message will now only be printed once on first invocation when peermem is not loaded.

Note: The static variable lacks explicit synchronization, which could result in multiple concurrent threads initializing peermem_loaded simultaneously. However, this race is benign (multiple assignments of the same value, potentially duplicate diagnostics), and the pattern is consistent with the existing uar_supported variable.

632-635: Good optimization: early return prevents unnecessary work.

The early return when peermem is not loaded is correct and efficient. Since peermem is required for any GPU to be usable with this transport, returning immediately avoids the overhead of GPU enumeration and device allocation when support is unavailable.

UCT/GDA: Move outside the loop.

UCT/GDA: Move outside the loop.

9a3058d

ofirfarjun7 force-pushed the topic/check-nvidia-peermem-v2 branch from 4c70512 to 9a3058d Compare November 10, 2025 14:20

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

ofirfarjun7 requested a review from brminich November 10, 2025 14:21

brminich approved these changes Nov 14, 2025

View reviewed changes

brminich merged commit b13cf9b into openucx:master Nov 14, 2025
148 checks passed

zzhang37 pushed a commit to intel-staging/ucx that referenced this pull request Nov 14, 2025

UCT/GDA: Move peermem check outside the loop. (openucx#11001)

aa6d13f

UCT/GDA: Move outside the loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UCT/GDA: Move peermem check outside the loop. #11001

UCT/GDA: Move peermem check outside the loop. #11001

Uh oh!

ofirfarjun7 commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UCT/GDA: Move peermem check outside the loop. #11001

UCT/GDA: Move peermem check outside the loop. #11001

Uh oh!

Conversation

ofirfarjun7 commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ofirfarjun7 commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading