Advanced request handling optimizations #1009

iyastreb · 2025-11-10T14:29:08Z

What?

This is a continuation of request handling optimization effort started in #982
In this PR 2 things are optimized:

Release all (except one) pending requests right after posting
Removed nixlUcxIntReq class, moved connection management to nixlUcxBackendH

Performance results

In nixlbench post time decreases 10x for RDMA batch 64k messages of 512B
PR1: #982
PR2: this PR

NIXLBENCH
nixlbench --initiator_seg_type=VRAM --target_seg_type=VRAM --start_block_size=512 --max_block_size=512 --start_batch_size=64000 --max_batch_size=64000 --warmup_iter=10 --num_iter=100 --progress_threads=8 &
 
# Num_threads=8 512:64k cuda_ipc
Branch  Block Size (B)      Batch Size     B/W (GB/Sec)   Avg Lat. (us)  Avg Prep (us)  P99 Prep (us)  Avg Post (us)
main    512                 65000          0.122929       4.2            6022.0         6022.0         121784.0
PR1     512                 65000          0.135798       3.8            6433.0         6433.0         103406.5
PR2     512                 64000          0.124752       4.1            8171.0         8171.0         114922.7
 
# Num_threads=8 512:64k rdma
Branch  Block Size (B)      Batch Size     B/W (GB/Sec)   Avg Lat. (us)  Avg Prep (us)  P99 Prep (us)  Avg Post (us)
main    512                 65000          1.932672       0.3            5540.0         5540.0         13065.2
PR1     512                 65000          2.787826       0.2            5583.0         5583.0         8764.7
PR2     512                 64000          6.710469       0.1            5871.0         5871.0         895.1

SGLANG TTFT
Size  MC     main   PR1    PR2 
1     28     25     22     20
2     47     54     45     45
4     125    129    115    111 
8     243    378    346    332
16    442    699    647    522
32    831    1433   1001   904 
64    1364   2032   1853   1804
128   2583   3869   3611   3060 
256   5232   6618   6083   5501
512   10469  12805  12440  10170
1024  22521  24990  22800  20392

github-actions · 2025-11-10T14:29:17Z

👋 Hi iyastreb! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

iyastreb · 2025-11-11T14:17:50Z

/build

src/plugins/ucx/ucx_backend.cpp

rakhmets · 2025-11-12T11:19:50Z

/build

brminich · 2025-11-13T16:16:20Z

src/plugins/ucx/ucx_backend.cpp

+            if (__builtin_expect(result.req != nullptr, 1)) {
+                ucp_request_free(result.req);
+            }
+            result.req = req;


how do we use this request? It can be returned to memory pool by UCX at any moment after the free

As you can see we don't use freed request at all.
Instead, the idea is to keep the LAST pending (or incomplete) request. So when we detect that current request is in pending state, we free the previously stored pending request (cause now we have a more recent one), and remember the recent one.

Later on we use this last pending request in "waiting for completion" stage (checkXfer/status) in order to:

detect whether request completed

error handling
In both cases, request is returned back to the UCX (either in status() -> worker->reqRelease() or in release() -> worker->reqRelease())

iyastreb added 3 commits November 10, 2025 12:54

Move connection to handle

9519c2d

Keep only 1 pending request per EP batch

abdb803

Disable lock for dedicated workers

883ca96

pull-request-size bot added the size/L label Nov 10, 2025

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 14:29 Inactive

copy-pr-bot bot had a problem deploying to SWX_AWS November 10, 2025 14:29 Failure

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 14:29 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 10, 2025 14:29 Inactive

github-actions bot added the external-contribution label Nov 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB November 10, 2025 14:29 Inactive

Formatting

363f98e

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 15:30 Inactive

copy-pr-bot bot had a problem deploying to SWX_AWS November 10, 2025 15:30 Failure

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 15:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 10, 2025 15:30 Inactive

FIxed NPE

a5d1207

copy-pr-bot bot temporarily deployed to GITLAB November 10, 2025 15:32 Inactive

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 15:32 Inactive

copy-pr-bot bot had a problem deploying to SWX_AWS November 10, 2025 15:32 Failure

copy-pr-bot bot temporarily deployed to SWX_AWS November 10, 2025 15:32 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 10, 2025 15:35 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 11, 2025 13:09 Inactive

copy-pr-bot bot temporarily deployed to SWX_AWS November 11, 2025 13:09 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 11, 2025 13:40 Inactive

copy-pr-bot bot temporarily deployed to SWX_AWS November 11, 2025 13:40 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 11, 2025 13:41 Inactive

rakhmets reviewed Nov 11, 2025

View reviewed changes

Merge branch 'main' into ucx-drop-requests-optimization

203f199

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 07:23 Inactive

copy-pr-bot bot temporarily deployed to SWX_AWS November 12, 2025 07:23 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 07:24 Inactive

CR fixes

9272ef5

copy-pr-bot bot temporarily deployed to SWX_AWS November 12, 2025 07:35 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 07:35 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 07:36 Inactive

rakhmets reviewed Nov 12, 2025

View reviewed changes

src/plugins/ucx/ucx_backend.cpp Outdated Show resolved Hide resolved

src/plugins/ucx/ucx_backend.cpp Show resolved Hide resolved

src/plugins/ucx/ucx_backend.cpp Show resolved Hide resolved

CR fixes

2edc9df

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 11:03 Inactive

copy-pr-bot bot temporarily deployed to SWX_AWS November 12, 2025 11:03 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 12, 2025 11:04 Inactive

rakhmets approved these changes Nov 12, 2025

View reviewed changes

brminich reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advanced request handling optimizations #1009

Advanced request handling optimizations #1009

Uh oh!

iyastreb commented Nov 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

iyastreb commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rakhmets commented Nov 12, 2025

Uh oh!

brminich Nov 13, 2025

Uh oh!

iyastreb Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Advanced request handling optimizations #1009

Are you sure you want to change the base?

Advanced request handling optimizations #1009

Uh oh!

Conversation

iyastreb commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Performance results

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

iyastreb commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rakhmets commented Nov 12, 2025

Uh oh!

brminich Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

iyastreb Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iyastreb commented Nov 10, 2025 •

edited

Loading