Skip to content

perf: use async H2D copy for contiguous shm forward input on npu.#1101

Merged
JimHsiung merged 1 commit intojd-opensource:mainfrom
JimHsiung:perf/async_h2d
Mar 25, 2026
Merged

perf: use async H2D copy for contiguous shm forward input on npu.#1101
JimHsiung merged 1 commit intojd-opensource:mainfrom
JimHsiung:perf/async_h2d

Conversation

@JimHsiung
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an asynchronous Host-to-Device copy for forward inputs on NPU to improve performance. The changes correctly use a stream and non-blocking transfers. My review identifies a potential issue where the underlying shared memory may not be pinned, which could cause the asynchronous copy to behave synchronously and negate the performance benefit. I've provided a suggestion to ensure the data is in pinned memory before the transfer.

Copy link
Copy Markdown
Collaborator

@yq33victor yq33victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JimHsiung JimHsiung merged commit 0ed421e into jd-opensource:main Mar 25, 2026
12 of 43 checks passed
@JimHsiung JimHsiung deleted the perf/async_h2d branch March 25, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants