[Feature]: FP8 for DS-R1

### 🚀 The feature, motivation and pitch

- Block-wise GEMM quantization
- MLA + FP8 KV-cache https://github.com/NVIDIA/TensorRT-LLM/blob/c4abca323e2662138fa3de47e22e78709e4d3b6e/examples/models/core/deepseek_v3/README.md?plain=1#L748
- Note! Different kernel sources for Hopper and Blackwell

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: FP8 for DS-R1 #8234

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: FP8 for DS-R1 #8234

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions