Skip to content

TOOLS/PERF/ZE: fix allocator thread safety and device routing#11320

Open
yafshar wants to merge 1 commit intoopenucx:masterfrom
intel-staging:topic/ze-perf-alloc-init-tls-fix
Open

TOOLS/PERF/ZE: fix allocator thread safety and device routing#11320
yafshar wants to merge 1 commit intoopenucx:masterfrom
intel-staging:topic/ze-perf-alloc-init-tls-fix

Conversation

@yafshar
Copy link
Copy Markdown
Contributor

@yafshar yafshar commented Apr 2, 2026

What?

  • Replace shared global command list and device index with per-thread TLS to eliminate multi-threaded races in ZE perftest paths.
  • Fix DEVICE and MANAGED allocations to route through the thread-selected ZE device instead of hardcoded device 0.
  • Add thread-safe one-time initialization via pthread_once and defensive checks for uninitialized TLS command lists and invalid memory types.
  • Improve lifecycle handling by cleaning up per-thread command lists on device switch and module unload

Why?

  • Proactive correctness hardening of the ZE perftest allocator: remove race-prone shared state, fix per-thread device routing,
    and improve stability in multi-threaded perf scenarios.
  • No formal issue was filed, but these issues can manifest as data races and incorrect device selection on multi-device systems.

- Replace global gpu_cmdlists with per-thread __thread command
  lists to eliminate data races in multi-threaded perf scenarios
- Replace global gpu_index with per-thread tls_gpu_index to enable
  correct device routing based on each thread’s selected device
- Use pthread_once for thread-safe one-time allocator initialization
- Add NULL command list checks before allocation and memcpy/memset
  operations
- Fix DEVICE and MANAGED allocations to use selected device index
  instead of hardcoded device 0
- Replace hardcoded gpu_page_size with dynamic ucs_get_page_size()
- Add explicit error handling for invalid memory types instead of
  silently treating them as MANAGED
- Add proper cleanup of per-thread command lists on module unload
- Rename UCX_PERF_ZE_MAX_DEVICES to ZE_PERF_MAX_DEVICES for
  consistency
@yafshar yafshar force-pushed the topic/ze-perf-alloc-init-tls-fix branch from bca1f08 to 77b74a4 Compare April 2, 2026 20:59
@yafshar yafshar changed the title UCT/ZE/PERF: fix allocator thread safety and device routing TOOLS/PERF/ZE: fix allocator thread safety and device routing Apr 2, 2026
@yafshar yafshar marked this pull request as ready for review April 2, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant