Skip to content

Conversation

@roiedanino
Copy link
Contributor

@roiedanino roiedanino commented Nov 5, 2025

What?

Warns about UCX_IB_MLX5_DEVX_OBJECTS being set to empty for Grace CPUs in ucx.conf

Why?

Disabling DEVX on Grace CPUs can cause serious slowdowns for small messages.

@roiedanino roiedanino self-assigned this Nov 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

These changes introduce CPU vendor/model detection for NVIDIA Grace processors via a new utility function and add a runtime warning path when ODP v2 is unavailable due to disabled DevX objects on CPUs that prefer ODP.

Changes

Cohort / File(s) Summary
CPU Architecture Utilities
src/ucs/arch/cpu.h
Added static inline function ucs_cpu_prefer_odp() to detect NVIDIA Grace CPUs and return ODP preference state.
MLX5 Device Driver
src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
Added conditional runtime warning when ODP v2 is unsupported due to disabled DevX and the CPU prefers ODP.

Sequence Diagram

sequenceDiagram
    participant Driver as MLX5 Driver Init
    participant CPU as CPU Detection
    participant ODP as ODP Setup

    Driver->>CPU: Check CPU preference via ucs_cpu_prefer_odp()
    CPU-->>Driver: Returns true if NVIDIA Grace
    
    Driver->>ODP: Attempt ODP v2 initialization
    ODP-->>Driver: v2 fails (DevX disabled)
    
    alt CPU prefers ODP
        Driver->>Driver: Emit runtime warning
    else CPU doesn't prefer ODP
        Driver->>Driver: Continue silently
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • New utility function is a straightforward conditional check with no complex logic
  • Warning path adds a simple conditional branch with minimal code density
  • Both changes are localized and self-contained
  • No modifications to existing control flow beyond the new additions

Poem

🐰 A whisper in the Grace CPU dreams,
ODP warnings flow like crystal streams,
New detection paths so clean and true,
NVIDIA's heartbeat shines right through! ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ⚠️ Warning The title describes a warning about UCX_IB_MLX5_DEVX_OBJECTS for Grace CPUs, but the actual changes add a CPU preference function and emit a warning when ODP v2 isn't supported. The title doesn't accurately reflect what the PR accomplishes. Update the title to better reflect the primary change, such as: 'Add CPU preference check and warning for ODP v2 support on Grace CPUs' or 'Add ucs_cpu_prefer_odp() and runtime warning for ODP v2 on Grace CPUs'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@brminich
Copy link
Contributor

brminich commented Nov 5, 2025

that setting was required to support ODP v1 on GH. Are we certain that ODPv2 is already supported in latest FW?
Also aren't the needed objects created with dv API instead?

@roiedanino
Copy link
Contributor Author

roiedanino commented Nov 10, 2025

that setting was required to support ODP v1 on GH. Are we certain that ODPv2 is already supported in latest FW? Also aren't the needed objects created with dv API instead?

I changed the PR to add a warning regarding ucx.conf disabling DevX, to give users an indication regarding the performance penalty caused by this.

It't not perfect either, as ucx.conf disables DevX for all Grace Cpus whether ODPv2 is supported or not, and we can't really know if the DEVX_OBJ env came from the user, ucx.conf or from default configuration.

@brminich What if we introduce a new env that forces enabling devx even with ODP enabled?
UCX_IB_MLX5_FORCE_DEVX_FOR_ODP_V1=n

That way, we can at least we can enable DevX OOB in a Grace + ODPv2 situation,
while leaving the user an option to override it

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/ucs/arch/cpu.h (1)

172-179: LGTM! Implementation follows existing patterns correctly.

The function correctly identifies NVIDIA Grace CPUs by checking both vendor and model, and follows the same pattern as ucs_cpu_prefer_relaxed_order(). The logic aligns with the PR objective to enable optimal DEVX/ODP behavior on Grace CPUs.

Consider adding a brief documentation comment explaining that this function identifies CPUs that benefit from ODP (On-Demand Paging) support, particularly for the context of DEVX object configuration:

+/**
+ * Check if the CPU prefers On-Demand Paging (ODP) support.
+ * Returns true for NVIDIA Grace CPUs which benefit from ODP v2 via DEVX.
+ */
 static inline int ucs_cpu_prefer_odp()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea2a7b5 and 64b3b2a.

📒 Files selected for processing (2)
  • src/ucs/arch/cpu.h (1 hunks)
  • src/uct/ib/mlx5/dv/ib_mlx5dv_md.c (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/uct/ib/mlx5/dv/ib_mlx5dv_md.c
🧰 Additional context used
🧬 Code graph analysis (1)
src/ucs/arch/cpu.h (2)
src/ucs/arch/x86_64/cpu.c (2)
  • ucs_arch_get_cpu_vendor (562-579)
  • ucs_arch_get_cpu_model (368-499)
src/ucs/arch/aarch64/cpu.h (2)
  • ucs_arch_get_cpu_vendor (126-140)
  • ucs_arch_get_cpu_model (142-153)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: UCX PR (Static_check Static checks)
  • GitHub Check: UCX PR (Codestyle AUTHORS file update check)
  • GitHub Check: UCX PR (Codestyle format code)
  • GitHub Check: UCX PR (Codestyle ctags check)
  • GitHub Check: UCX PR (Codestyle commit title)
  • GitHub Check: UCX PR (Codestyle codespell check)
  • GitHub Check: UCX release DRP (Prepare CheckRelease)
  • GitHub Check: UCX release (Prepare CheckRelease)
  • GitHub Check: UCX snapshot (Prepare Check)

@roiedanino roiedanino changed the title CONFIG: Don't set UCX_IB_MLX5_DEVX_OBJECTS empty for Grace CPUs UCT/IB/MLX5: Warn about UCX_IB_MLX5_DEVX_OBJECTS being set empty for Grace CPUs in ucx.conf Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants