-
Notifications
You must be signed in to change notification settings - Fork 1
remove spurious cpu->gpu and gpu->cpu transfers #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/ad-2025-07-22
Are you sure you want to change the base?
Changes from 1 commit
42a507e
a420975
4b6b5b2
67c0957
4a6f1c2
e3d4415
f2ba5da
c309859
dfa1d1b
63fe801
e6b5a08
8fef393
e4aa045
6483f0f
e2ad155
7d5a102
b4df6b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -209,7 +209,8 @@ def _prepare_inputs( | |
|
|
||
| # update the sequence info object now | ||
| si = self.cache_seq_interface.info | ||
| si.update_pos(input_pos, reset=True) | ||
| # skip calling _update_position_ids() here, as it will be called in nest_sequences | ||
| si.update_pos(input_pos, reset=True, update_position_ids=False) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe it's better to not call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updating the position ids requires both the input positions and the sequence lengths, so it makes sense to update it whenever either is updated, but it's a bit wasteful. In any case, due to my recent changes, run time of |
||
| si.assign_cache_loc(page_assignments) | ||
| si.nest_sequences(input_ids) | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation moves seq_len to device before adding to host tensor, which defeats the purpose of keeping calculations on host. Consider converting seq_len to CPU first:
self.input_pos_host[:bs] += seq_len.cpu()