Skip to content

Conversation

@ABNER-1
Copy link
Contributor

@ABNER-1 ABNER-1 commented Oct 20, 2025

The first in #29

Disables the batch tensor cache by default since it is unnecessary for most scenarios and consumes extra GPU memory.

Example Configuration:

  • Backend/Framework: SGLang with TP8

  • Model: deepseek-r1 (each file: 4GiB)

Extra Memory Comparison:

  • With disable_cache=True: ~5.7 GiB = 4 GiB (file) + 1.7 GiB (largest tensor)

  • With disable_cache=False: 32 GiB = 4 GiB * 8 (files)

Conclusion: Disabling the cache reduces peak GPU memory usage by approximately 82%.

@ABNER-1 ABNER-1 force-pushed the add_disable_cache_in_LazyTensorFactory branch from 00456d2 to 65ff53f Compare October 20, 2025 03:07
@takeshi-yoshimura takeshi-yoshimura merged commit 7dfe757 into foundation-model-stack:main Oct 20, 2025
13 checks passed
@takeshi-yoshimura
Copy link
Collaborator

looks perfect. thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants