Skip to content

Conversation

@yihuaxu
Copy link

@yihuaxu yihuaxu commented Nov 13, 2025

What?

Currently it can not support ZE backend for GDR usage.

Why?

On GDR mode, it will improve the performance to reduce the data transfer with extra memcpy.

How?

Now we can implement it based it use the ZE backend's dmabuf feature and IB's GDR support

Summary by CodeRabbit

  • New Features

    • GPUDirect RDMA detection now recognizes ZE devices when ZE support is enabled, expanding device discovery.
    • Accelerator copy operations now report host memory support in addition to ZE host/device/managed types, improving interoperability.
  • Documentation

    • AUTHORS file updated with an additional contributor entry.

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Walkthrough

Adds a ZE (Level Zero) GPUDirect RDMA driver presence check (guarded by #if HAVE_ZE) into IB MD initialization and includes UCS_MEMORY_TYPE_HOST in ZE copy MD's reported access_mem_types. Also appends an author entry to AUTHORS.

Changes

Cohort / File(s) Summary
IB MD: GPUDirect detection
src/uct/ib/base/ib_md.c
Inserts a ZE GPUDirect driver presence check into uct_ib_md_open_common() (placed after ROCm KFD probe and before DMA-BUF/module checks), wrapped in #if HAVE_ZE, which can mark GPUDirect RDMA as available.
ZE Copy MD: Memory type config
src/uct/ze/copy/ze_copy_md.c
Updates uct_ze_copy_md_query() to include UCS_MEMORY_TYPE_HOST in md_attr->access_mem_types alongside UCS_MEMORY_TYPE_ZE_HOST, UCS_MEMORY_TYPE_ZE_DEVICE, and UCS_MEMORY_TYPE_ZE_MANAGED.
Metadata
AUTHORS
Adds author entry Yihua Xu <[email protected]>.

Sequence Diagram(s)

sequenceDiagram
  participant Init as uct_ib_md_open_common()
  participant KFD as ROCm KFD probe
  participant ZE as ZE GPUDirect probe (new)
  participant DMB as DMA-BUF/module checks
  Note over Init: IB MD init sequence (high-level)
  Init->>KFD: probe ROCm KFD
  KFD-->>Init: result
  alt HAVE_ZE
    Init->>ZE: probe ZE GPUDirect driver
    ZE-->>Init: result (may set GPUDIRECT_RDMA)
  end
  Init->>DMB: continue DMA-BUF/module checks
  DMB-->>Init: continue initialization
Loading
sequenceDiagram
  participant Query as uct_ze_copy_md_query()
  participant Attr as md_attr
  Note over Query: build access_mem_types mask
  Query->>Attr: set = ZE_HOST | ZE_DEVICE | ZE_MANAGED | HOST
  Attr-->>Query: return md attributes
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Localized, small edits across three files.
  • Review focus:
    • Correct #if HAVE_ZE placement and probe error handling in ib_md.c.
    • Confirm memory-type semantics after adding UCS_MEMORY_TYPE_HOST in ze_copy_md.c.
    • Verify AUTHORS formatting/location.

Poem

🐇 I sniffed the kernel, peeped where ZE might hide,
A guarded check hopped in, just tucked inside.
Host joined device and managed in a cheerful throng,
New name in the list — a brief, glad song. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support gdr for intel xpu' directly addresses the main objective of the PR, which implements GDR support for Intel XPU using the ZE backend.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 613fbb8 and ca2a77e.

📒 Files selected for processing (3)
  • AUTHORS (1 hunks)
  • src/uct/ib/base/ib_md.c (2 hunks)
  • src/uct/ze/copy/ze_copy_md.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/uct/ze/copy/ze_copy_md.c
  • src/uct/ib/base/ib_md.c
🔇 Additional comments (1)
AUTHORS (1)

112-112: LGTM! Proper author attribution.

The author entry is correctly positioned in alphabetical order and follows the established format.


Comment @coderabbitai help to get the list of available commands and usage tips.

yihuaxu added a commit to yihuaxu/ucx that referenced this pull request Nov 13, 2025
@yihuaxu yihuaxu force-pushed the yihuaxu/xpu-gdr-support branch from 3b374b7 to b145428 Compare November 13, 2025 08:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b374b7 and b145428.

📒 Files selected for processing (2)
  • src/uct/ib/base/ib_md.c (1 hunks)
  • src/uct/ze/copy/ze_copy_md.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/uct/ze/copy/ze_copy_md.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: UCX PR (Codestyle commit title)
  • GitHub Check: UCX PR (Codestyle ctags check)
  • GitHub Check: UCX PR (Codestyle codespell check)
  • GitHub Check: UCX PR (Codestyle AUTHORS file update check)
  • GitHub Check: UCX PR (Codestyle format code)
  • GitHub Check: UCX release DRP (Prepare CheckRelease)
  • GitHub Check: UCX release (Prepare CheckRelease)
  • GitHub Check: UCX snapshot (Prepare Check)

yihuaxu added a commit to yihuaxu/ucx that referenced this pull request Nov 13, 2025
* UCT/ZE: Add host memory type
* UCT/IB: Add GDR support for intel XPU
@yihuaxu yihuaxu force-pushed the yihuaxu/xpu-gdr-support branch from b145428 to 98fce7d Compare November 13, 2025 09:02
yihuaxu added a commit to yihuaxu/ucx that referenced this pull request Nov 13, 2025
* UCT/ZE: Add host memory type
* UCT/IB: Add GDR support for intel XPU
* AUTHORS: Add author
@yihuaxu yihuaxu force-pushed the yihuaxu/xpu-gdr-support branch 2 times, most recently from 613fbb8 to ca2a77e Compare November 13, 2025 09:16
* UCT/ZE: Add host memory type
* UCT/IB: Add GDR support for intel XPU
* AUTHORS: Add author
@yihuaxu yihuaxu closed this Nov 14, 2025
@yihuaxu yihuaxu deleted the yihuaxu/xpu-gdr-support branch November 14, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant