Skip to content

Commit ffedd01

Browse files
Update docs/guides/sequential_onloading.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
1 parent 87ade45 commit ffedd01

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/guides/sequential_onloading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ for layer in model.layers:
2222

2323
## Implementation ##
2424

25-
Before a model can be sequentially onloaded, it must first be broken up into disjoint parts which can be individually onloaded. This is achived through the [torch.fx.Tracer](https://github.com/pytorch/pytorch/blob/main/torch/fx/README.md#tracing) module, which allows a model to represented as a graph operations (nodes) and data inputs (edges). Once the model has been traced into a valid graph representation, the graph is cut (partitioned) into disjoint subgraphs, each of which is onloaded individually as a layer. This implementation can be found [here](/src/llmcompressor/pipelines/sequential/helpers.py).
25+
Before a model can be sequentially onloaded, it must first be broken up into disjoint parts which can be individually onloaded. This is achieved through the [torch.fx.Tracer](https://github.com/pytorch/pytorch/blob/main/torch/fx/README.md#tracing) module, which allows a model to be represented as a graph of operations (nodes) and data inputs (edges). Once the model has been traced into a valid graph representation, the graph is cut (partitioned) into disjoint subgraphs, each of which is onloaded individually as a layer. This implementation can be found [here](/src/llmcompressor/pipelines/sequential/helpers.py).
2626

2727
![sequential_onloading](../assets/model_graph.jpg)
2828
*This image depicts some of the operations performed when executing the Llama3.2-Vision model*

0 commit comments

Comments
 (0)