Skip to content

Commit 4aa5a6a

Browse files
Update docs/guides/sequential_onloading.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
1 parent ffedd01 commit 4aa5a6a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/guides/sequential_onloading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Before a model can be sequentially onloaded, it must first be broken up into dis
3131
*This image depicts the sequential text decoder layers of the Llama3.2-Vision model. Each of the individual decoder layers*
3232

3333
## Sequential Targets and Usage ##
34-
You can use sequential onloading by calling `oneshot` with the `pipeline="sequential"` argument. Note that this pipeline is the default for all oneshot calls which require calibration data. If the sequential pipeline provides to be problematic, you can specify `pipeline="basic"` to use a basic pipeline which does not require sequential onloading, but only works performantly when the model is small enough to fit into the available VRAM.
34+
You can use sequential onloading by calling `oneshot` with the `pipeline="sequential"` argument. Note that this pipeline is the default for all oneshot calls which require calibration data. If the sequential pipeline proves to be problematic, you can specify `pipeline="basic"` to use a basic pipeline which does not require sequential onloading, but only works performantly when the model is small enough to fit into the available VRAM.
3535

3636
If you are compressing a model using a GPU with a small amount of memory, you may need to change your sequential targets. Sequential targets control how many weights to onload to the GPU at a time. By default, the sequential targets are decoder layers which may include large MoE layers. In these cases, setting the `sequential_targets="Linear"` argument in `oneshot` will result in lower VRAM usage, but a longer runtime.
3737

0 commit comments

Comments
 (0)