add disable cache in LazyTensorFactory #30

ABNER-1 · 2025-10-20T02:16:17Z

The first in #29

Disables the batch tensor cache by default since it is unnecessary for most scenarios and consumes extra GPU memory.

Example Configuration:

Backend/Framework: SGLang with TP8
Model: deepseek-r1 (each file: 4GiB)

Extra Memory Comparison:

With disable_cache=True: ~5.7 GiB = 4 GiB (file) + 1.7 GiB (largest tensor)
With disable_cache=False: 32 GiB = 4 GiB * 8 (files)

Conclusion: Disabling the cache reduces peak GPU memory usage by approximately 82%.

Signed-off-by: yuanyuxing.yyx <[email protected]>

takeshi-yoshimura · 2025-10-20T04:46:46Z

looks perfect. thanks for contributing!

add disable cache in LazyTensorFactory

65ff53f

Signed-off-by: yuanyuxing.yyx <[email protected]>

ABNER-1 force-pushed the add_disable_cache_in_LazyTensorFactory branch from 00456d2 to 65ff53f Compare October 20, 2025 03:07

takeshi-yoshimura merged commit 7dfe757 into foundation-model-stack:main Oct 20, 2025
13 checks passed

This was referenced Dec 2, 2025

[Bug]: fastsafetensors in tensor parallel requires too much VRAM vllm-project/vllm#29403

Open

Error when using in Ray cluster #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add disable cache in LazyTensorFactory #30

add disable cache in LazyTensorFactory #30

ABNER-1 commented Oct 20, 2025

Uh oh!

Uh oh!

takeshi-yoshimura commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add disable cache in LazyTensorFactory #30

add disable cache in LazyTensorFactory #30

Conversation

ABNER-1 commented Oct 20, 2025

Example Configuration:

Extra Memory Comparison:

Uh oh!

Uh oh!

takeshi-yoshimura commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants