[CLI] Report KV cache memory usage in mlc_llm compile #3221

CharlieFRuan · 2025-05-04T01:39:48Z

This PR prints out the memory usage of KV cache: MB for one token's KV cache, and the total MB for model weights + intermediate buffers + a 4K-long KV cache.

If somehow the required fields are not present in config and metadata (e.g. for an old model), we do nothing.

Sample output in CLI:

[2025-05-03 21:44:24] INFO model_metadata.py:94: Total memory usage without KV cache: 2254.16 MB (Parameters: 923.16 MB. Temporary buffer: 1331.00 MB)
[2025-05-03 21:44:24] INFO model_metadata.py:128: KV cache size: 0.11 MB per token in the context window
[2025-05-03 21:44:24] INFO model_metadata.py:133: Total memory usage with a 4K KV cache: 2702.16 MB

CharlieFRuan added 2 commits May 3, 2025 21:37

[CLI] Report KV cache memory usage in mlc_llm compile

c709b6f

Trivial

4758549

CharlieFRuan requested a review from MasterJH5574 May 4, 2025 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CLI] Report KV cache memory usage in mlc_llm compile #3221

[CLI] Report KV cache memory usage in mlc_llm compile #3221

CharlieFRuan commented May 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

[CLI] Report KV cache memory usage in mlc_llm compile #3221

Are you sure you want to change the base?

[CLI] Report KV cache memory usage in mlc_llm compile #3221

Conversation

CharlieFRuan commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CharlieFRuan commented May 4, 2025 •

edited

Loading