Skip to content

TOOLS/PERF: Split .cu file to make build from 6min down to around 4min#11279

Open
tvegas1 wants to merge 2 commits intoopenucx:masterfrom
tvegas1:cu_file_split
Open

TOOLS/PERF: Split .cu file to make build from 6min down to around 4min#11279
tvegas1 wants to merge 2 commits intoopenucx:masterfrom
tvegas1:cu_file_split

Conversation

@tvegas1
Copy link
Copy Markdown
Contributor

@tvegas1 tvegas1 commented Mar 20, 2026

What?

Proposal to split .cu file code in separate files.

Why?

Parallelize nvcc invocations, reduces build time from 5 min 57 sec, down to 4 min 15 sec.

How?

Keep Host, __global__, and __device__ functions in the same file, dispatching only the Host code for a given template instantiation.

  CC       libucx_perftest_cuda_la-cuda_alloc.lo
  CC       libucx_perftest_cuda_la-ucp_cuda_impl.lo
  NVCC     ucp_cuda_kernel_bw.lo
  NVCC     ucp_cuda_kernel_bw_thread_nofc.lo
  NVCC     ucp_cuda_kernel_bw_thread_fc.lo
  NVCC     ucp_cuda_kernel_bw_warp_fc.lo
  NVCC     ucp_cuda_kernel_bw_warp_nofc.lo
  NVCC     ucp_cuda_kernel_latency.lo
  NVCC     ucp_cuda_kernel_latency_thread.lo
  NVCC     ucp_cuda_kernel_latency_warp.lo
  NVCC     ucp_cuda_kernel_wait.lo
  NVCC     ucp_cuda_host.lo
  CCLD     libucx_perftest_cuda.la
  LN       libucx_perftest_cuda.la
  LN       .libs/libucx_perftest_cuda.so
  LN       .libs/libucx_perftest_cuda.so.0

@tvegas1
Copy link
Copy Markdown
Contributor Author

tvegas1 commented Mar 20, 2026

How should I make sure that everything still work, functionality and perf?

@tvegas1
Copy link
Copy Markdown
Contributor Author

tvegas1 commented Mar 20, 2026

We can also try to split this AI-based concept PR, in multiple PR, if deemed useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant