-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
KV-Cache Managementkv-cache management for efficient LLM inferencekv-cache management for efficient LLM inferencefeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Description
π The feature, motivation and pitch
Currently NVFP4 is only available for SM10X in trtllm-gen. It would be useful to have this feature available for consumer Blackwell chips that are SM120 based. The primary use is to allow more VRAM to be used for context.
Alternatives
There are no alternative solutions. I tried loading the cubins for SM100 on the SM120 but it did not work.
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
coderabbitai
Metadata
Metadata
Assignees
Labels
KV-Cache Managementkv-cache management for efficient LLM inferencekv-cache management for efficient LLM inferencefeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support