Skip to content

(fix) Fetch referenced config on models that require it#38

Merged
alvarobartt merged 3 commits intoalvarobartt:mainfrom
Napuh:main
Mar 5, 2026
Merged

(fix) Fetch referenced config on models that require it#38
alvarobartt merged 3 commits intoalvarobartt:mainfrom
Napuh:main

Conversation

@Napuh
Copy link
Contributor

@Napuh Napuh commented Mar 1, 2026

Description

Some models, like llava-hf/llava-1.5-7b-hf have the necessary config to calculate the KV cache size referenced to other models. For example:

"text_config": {
  "_name_or_path": "lmsys/vicuna-7b-v1.5",
  "architectures": [
  "LlamaForCausalLM"
  ],
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "rms_norm_eps": 1e-05,
  "torch_dtype": "float16",
  "vocab_size": 32064
}

Estimating the memory requirements for this model with the --experimental flag for KV-cache size estimation, throws and error:

TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

This PR allows to fetch the referenced config, and use those values to estimate the KV-cache size. If the _name_or_path key is present in text_config, it uses those values as a base for LLM parameters (num_hidden_layers, hidden_size, etc.), and then overwrites with values from the original text_config (like max_position_embeddings, which may have been modified)

Before this PR, ...ForConditionalGenerationmodels with referenced configs failed to calculate the KV-cache size:

hf-mem --model-id llava-hf/llava-1.5-7b-hf --experimental

Traceback (most recent call last):
  File "/Users/napuh/CODE/hf-mem/.venv/bin/hf-mem", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/napuh/CODE/hf-mem/src/hf_mem/cli.py", line 457, in main
    asyncio.run(
  File "/Users/napuh/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Users/napuh/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/napuh/.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/napuh/CODE/hf-mem/src/hf_mem/cli.py", line 355, in run
    2
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

After this PR, the command executes succesfully.


  • I have read and followed the guidelines in CONTRIBUTING.md.
  • This has been discussed over an issue or discussion.

Copy link
Owner

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Napuh I've left a nit with that I think we can merge 🤗

@alvarobartt alvarobartt merged commit ca9f0ec into alvarobartt:main Mar 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants