UCT/ZE: Fix reset path, DMA-BUF ownership, and descriptor init#11223
Merged
yosefe merged 15 commits intoopenucx:masterfrom Mar 7, 2026
Merged
UCT/ZE: Fix reset path, DMA-BUF ownership, and descriptor init#11223yosefe merged 15 commits intoopenucx:masterfrom
yosefe merged 15 commits intoopenucx:masterfrom
Conversation
Implement Level Zero device enumeration and topology registration to properly integrate Intel GPUs with UCX's topology subsystem. Key changes: - Enumerate Level Zero devices and sub-devices during initialization - Register each physical device once with topology using PCI bus ID - All sub-devices on same device share parent's sys_dev for IB affinity - Device naming: "GPU0" for single sub-device, "GPU0.0"/"GPU0.1" for multi - Use zeDevicePciGetPropertiesExt() for PCI properties (Level Zero 1.0+ compat) - Enable auxiliary paths for multi-path routing Architecture: - Static sub-device array populated at init, read-only after - Query functions return empty list on init failure (not error) - One MD resource, one TL device per sub-device API cleanup: - Removed unused functions from public header
Fix device enumeration on systems where Level Zero reports tiles as separate root devices (e.g., Ponte Vecchio Data Center Max) rather than hierarchical sub-devices. Changes: - Detect duplicate PCI addresses (BDF) to identify tiles on same GPU - Share sys_dev across root devices with identical PCI address - Support both hierarchical (zeDeviceGetSubDevices) and flat models - Preserve all 8 device handles (GPU0-GPU7) with correct 4-sys_dev mapping Fixes incorrect NUMA/IB affinity when flat hierarchy causes separate topology registration for tiles on same physical device.
zeMemGetAllocProperties returns an exported dmabuf fd that must be closed by UCX after duplicating it for the caller. Previously, each mem_query leaked one fd. Add a centralized cleanup path to always close the original fd and handle dup() failure.
Set mandatory stype in ze_host_mem_alloc_desc_t and ze_device_mem_alloc_desc_t used by mem_alloc. Although the descriptors were zero-initialized, explicit stype is required by Level Zero and improves compatibility with stricter runtime validation and future extension chaining.
Contributor
Author
|
The checks failure are unrelated to this PR! |
yosefe
approved these changes
Mar 7, 2026
jeynmann
pushed a commit
to jeynmann/ucx
that referenced
this pull request
Mar 17, 2026
…cx#11223) * UCT/ZE: Add device topology registration Implement Level Zero device enumeration and topology registration to properly integrate Intel GPUs with UCX's topology subsystem. Key changes: - Enumerate Level Zero devices and sub-devices during initialization - Register each physical device once with topology using PCI bus ID - All sub-devices on same device share parent's sys_dev for IB affinity - Device naming: "GPU0" for single sub-device, "GPU0.0"/"GPU0.1" for multi - Use zeDevicePciGetPropertiesExt() for PCI properties (Level Zero 1.0+ compat) - Enable auxiliary paths for multi-path routing Architecture: - Static sub-device array populated at init, read-only after - Query functions return empty list on init failure (not error) - One MD resource, one TL device per sub-device API cleanup: - Removed unused functions from public header * UCT/ZE: Fix code style in ze_base files * UCT/ZE: Fix topology registration for flat device hierarchies Fix device enumeration on systems where Level Zero reports tiles as separate root devices (e.g., Ponte Vecchio Data Center Max) rather than hierarchical sub-devices. Changes: - Detect duplicate PCI addresses (BDF) to identify tiles on same GPU - Share sys_dev across root devices with identical PCI address - Support both hierarchical (zeDeviceGetSubDevices) and flat models - Preserve all 8 device handles (GPU0-GPU7) with correct 4-sys_dev mapping Fixes incorrect NUMA/IB affinity when flat hierarchy causes separate topology registration for tiles on same physical device. * UCX/ZE: Refactor base initialization into helper functions * UCT/ZE/COPY: always reset command list and propagate reset failures * UCT/ZE/COPY: Close exported dmabuf fd after dup in mem_query zeMemGetAllocProperties returns an exported dmabuf fd that must be closed by UCX after duplicating it for the caller. Previously, each mem_query leaked one fd. Add a centralized cleanup path to always close the original fd and handle dup() failure. * UCT/ZE/COPY: initialize stype in Level Zero alloc descriptors Set mandatory stype in ze_host_mem_alloc_desc_t and ze_device_mem_alloc_desc_t used by mem_alloc. Although the descriptors were zero-initialized, explicit stype is required by Level Zero and improves compatibility with stricter runtime validation and future extension chaining. * UCT/ZE/COPY: remove redundant ep_create/ep_destroy ops entries * UCT/ZE: style and whitespace cleanup * UCT/ZE/COPY: preserve Level Zero DMA-BUF export fd ownership in mem_query * UCT/ZE/COPY: clang-format cleanup in ZE copy files * UCT/ZE/COPY: simplify dmabuf fd setup in mem_query
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
This PR fixes critical robustness issues in the ZE copy transport related to command-list lifecycle, DMA-BUF fd handling, Level Zero descriptor initialization, and interface ops table cleanup.
Changes included:
uct_ze_copy_ep_zcopy, including failure pathsstypefields.ep_create/.ep_destroyassignments in ZE copy iface opsWhy
These issues can cause runtime instability and hard-to-debug failures:
Command list state corruption
On copy-path failures, returning before
zeCommandListReset()can leave the command list closed, so later operations fail when appending commands.DMA-BUF fd ownership/lifecycle issues
The original code did not fully handle DMA-BUF export-fd ownership/lifecycle semantics. During this PR development, local closure of the exported DMA-BUF fd was initially introduced to avoid leaks. Validation then showed that Level Zero may cache exported fds per allocation, so this PR keeps the original export fd untouched (Level Zero-owned) and returns only a duplicated fd to the caller (UCX/caller-owned).
Level Zero API contract compliance
Allocation descriptors require correct
stypeinitialization; omitting it is undefined behavior and can fail on stricter drivers.Code hygiene / maintainability
Duplicate interface-op assignments are redundant and can confuse future maintenance.
How
1) Command-list reset hardening (
ze_copy_ep.c)Refactor
uct_ze_copy_ep_zcopyto a centralized cleanup path (goto out_reset) sozeCommandListReset()is attempted on all paths. Reset failures are propagated as errors.2) DMA-BUF export handling (
ze_copy_md.c)UCT_MD_MEM_ATTR_FIELD_DMABUF_FDis presentexport_fd.fd = UCT_DMABUF_FD_INVALIDNULLfor DMA-BUF export when fd is not requestedmem_attr_p->dmabuf_fdtoUCT_DMABUF_FD_INVALIDwhen requesteddup(export_fd.fd)to transfer fd ownership to the callerUCS_ERR_UNSUPPORTEDif DMA-BUF fd is requested but export is unavailable3) Descriptor initialization (
ze_copy_md.c)Initialize:
ze_host_mem_alloc_desc_t.stype = ZE_STRUCTURE_TYPE_HOST_MEM_ALLOC_DESCze_device_mem_alloc_desc_t.stype = ZE_STRUCTURE_TYPE_DEVICE_MEM_ALLOC_DESC4) Interface ops cleanup (
ze_copy_iface.c)Remove redundant first
.ep_create/.ep_destroyassignments and keep only the class-based entries.