Update docs/guides/sequential_onloading.md

kylesayrs · gemini-code-assist[bot] · kylesayrs · commit 4aa5a6a9b4a4 · 2026-02-25T11:13:40.000-05:00
Co-authored-by: gemini-code-assist[bot] &lt;176961590+gemini-code-assist[bot]@users.noreply.github.com&gt;
Signed-off-by: Kyle Sayers &lt;kylesayrs@gmail.com&gt;
diff --git a/docs/guides/sequential_onloading.md b/docs/guides/sequential_onloading.md
@@ -31,7 +31,7 @@ Before a model can be sequentially onloaded, it must first be broken up into dis
 *This image depicts the sequential text decoder layers of the Llama3.2-Vision model. Each of the individual decoder layers*
 
 ## Sequential Targets and Usage ##
-You can use sequential onloading by calling `oneshot` with the `pipeline="sequential"` argument. Note that this pipeline is the default for all oneshot calls which require calibration data. If the sequential pipeline provides to be problematic, you can specify `pipeline="basic"` to use a basic pipeline which does not require sequential onloading, but only works performantly when the model is small enough to fit into the available VRAM.
+You can use sequential onloading by calling `oneshot` with the `pipeline="sequential"` argument. Note that this pipeline is the default for all oneshot calls which require calibration data. If the sequential pipeline proves to be problematic, you can specify `pipeline="basic"` to use a basic pipeline which does not require sequential onloading, but only works performantly when the model is small enough to fit into the available VRAM.
 
 If you are compressing a model using a GPU with a small amount of memory, you may need to change your sequential targets. Sequential targets control how many weights to onload to the GPU at a time. By default, the sequential targets are decoder layers which may include large MoE layers. In these cases, setting the `sequential_targets="Linear"` argument in `oneshot` will result in lower VRAM usage, but a longer runtime.