Skip to content

Commit d399af4

Browse files
committed
up
1 parent 22b229b commit d399af4

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
- local: optimization/memory
7878
title: Reduce memory usage
7979
- local: optimization/speed-memory-optims
80-
title: Compile and offloading quantized models
80+
title: Compiling and offloading quantized models
8181
- title: Community optimizations
8282
sections:
8383
- local: optimization/pruna

docs/source/en/optimization/speed-memory-optims.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Compile and offloading quantized models
13+
# Compiling and offloading quantized models
1414

1515
Optimizing models often involves trade-offs between [inference speed](./fp16) and [memory-usage](./memory). For instance, while [caching](./cache) can boost inference speed, it also increases memory consumption since it needs to store the outputs of intermediate attention layers. A more balanced optimization strategy combines quantizing a model, [torch.compile](./fp16#torchcompile) and various [offloading methods](./memory#offloading).
1616

0 commit comments

Comments
 (0)