Skip to content

[Feature]: Add NVFP4 KV cache support for SM120 in trtllm-genΒ #10241

@lgreenlee

Description

@lgreenlee

πŸš€ The feature, motivation and pitch

Currently NVFP4 is only available for SM10X in trtllm-gen. It would be useful to have this feature available for consumer Blackwell chips that are SM120 based. The primary use is to allow more VRAM to be used for context.

Alternatives

There are no alternative solutions. I tried loading the cubins for SM100 on the SM120 but it did not work.

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

KV-Cache Managementkv-cache management for efficient LLM inferencefeature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions