-
Notifications
You must be signed in to change notification settings - Fork 453
[Performance] Sequential onloading #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 40 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
1aea4dd
wip: alignment context
kylesayrs 6705bf4
touchups based on remaining steps
brian-dellabetta cf1f87d
implement oneshot_device, pipeline warnings
kylesayrs 97c8d30
simplify example
kylesayrs ecfe15d
move offloading outside of preprocess, which is shared with train
kylesayrs 6f86244
cleanup
kylesayrs 929f678
update examples, remove offload devicemap utils
kylesayrs 0348243
Merge remote-tracking branch 'origin' into kylesayrs/sequential-onloa…
kylesayrs a275f53
update examples to load before generating
kylesayrs 9d6c227
remove hooks
kylesayrs fab6fe1
Merge remote-tracking branch 'origin' into kylesayrs/sequential-onloa…
kylesayrs 6fdcdb1
Merge remote-tracking branch 'origin' into kylesayrs/sequential-onloa…
kylesayrs 8351ac9
name change
kylesayrs ad71c5b
cleanup and nits
kylesayrs 819df1c
rename function
kylesayrs 6d942cc
Merge remote-tracking branch 'origin' into kylesayrs/sequential-onloa…
kylesayrs 7dd71b9
add dispatch utility
kylesayrs 8ba0f2c
apply style
kylesayrs fbf2a6d
update examples
kylesayrs 91b349b
update examples 2
kylesayrs 8e58e35
remove fallback_to_cpu, use ct utils
kylesayrs 96631d1
remove hook from module within utils function
kylesayrs 96476fe
remove unused util
kylesayrs 2d87993
Merge remote-tracking branch 'origin' into kylesayrs/sequential-onloa…
kylesayrs cb965c9
docstring
kylesayrs 8769b85
remove big model example tests
kylesayrs a389d14
big modeling example readme
kylesayrs b336fa2
deprecate sequential_targets on modifiers
kylesayrs 34ef394
update examples
kylesayrs 58fe929
fix deprecation warning
kylesayrs 54ef06a
fix layer sequential pipeline
kylesayrs 4bb86e5
remove unused import
kylesayrs b2367ce
dispatch in pipelines
kylesayrs 06bb661
add train dispatch
kylesayrs a64a777
use remove_dispatch
kylesayrs 8f71004
fix example
kylesayrs 7d7b00d
remove device arg from e2e
kylesayrs 501056e
simplify pipeline inference logic, add comment
kylesayrs 74aa7c9
update examples imports
kylesayrs e4487e2
fix call
kylesayrs 931e4e9
fix initial device, update import
kylesayrs 6edc523
Merge branch 'main' into kylesayrs/sequential-onloading
kylesayrs f3f3eb8
Merge branch 'main' into kylesayrs/sequential-onloading
dsikka File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
81 changes: 0 additions & 81 deletions
81
examples/big_models_with_accelerate/mult_gpus_int8_device_map.py
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| ## Big Modeling with Sequential Onloading ## | ||
| ### What is Sequential Onloading? ### | ||
| Sequential onloading is a memory-efficient approach for compressing large language models (LLMs) using only a single GPU. Instead of loading the entire model into memory—which can easily require hundreds of gigabytes—this method loads and compresses one layer at a time. The outputs are offloaded before the next layer is processed, dramatically reducing peak memory usage while maintaining high compression fidelity. | ||
kylesayrs marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| <p align="center"> | ||
| <img src="assets/sequential_onloading.png"/> | ||
| </p> | ||
|
|
||
| For more information, see the [RedHat AI blog post](https://developers.redhat.com/articles/2025/05/09/llm-compressor-optimize-llms-low-latency-deployments#generalizing_to_multimodal_and_moe_architectures) or the [LLM Compressor Office Hours Recording](https://www.youtube.com/watch?v=GrhuqQDmBk8). | ||
|
|
||
| ### Using Sequential Onloading ### | ||
| Sequential onloading is enabled by default within LLM Compressor. To disable sequential onloading, add the `pipeline="basic"` argument to the LLM Compressor `oneshot` function call. | ||
Binary file added
BIN
+69.5 KB
examples/big_models_with_sequential_onloading/assets/sequential_onloading.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.