[AutoDeploy]: MLA optimizations for DS-R1

### 🚀 The feature, motivation and pitch

- [x] Torch operators: MLA MHA-mode (no weight absorption) with and without cache.
- [x] FlashInfer operators: MLA MQA-mode (weight absorption) with cache; for decode and mixed decode+prefill (flattened). Uses `flashinfer.mla.BatchMLAPagedAttentionWrapper` and `flashinfer.append_paged_mla_kv_cache`
- [x] Weight absorption optimizer pass
- [x] Patch deepseek R1
- [x] Pytorch and FlashInfer MLA backends
- [ ] Cuda graph for FlashInfer MLA operator
- [ ] All correctness tests are passing

In a future task:
- [ ] FlashInfer MLA MHA-mode (no weight absorption) with cache; ragged; for prefill-only. Uses `flashinfer.BatchPrefillWithRaggedKVCacheWrapper`
       This kernel is performant in prefill-only use-cases.
       To support mixed decode+prefill we need to:
       1. Compute new `ckv` + `k_pe` and append to the cache (paged)
       2. Read from the cache and write to a new ragged layout (paged cache has "holes") and the kernel only uses `kv_indptr` without lengths.
       3. Compute the output

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoDeploy]: MLA optimizations for DS-R1 #8233

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AutoDeploy]: MLA optimizations for DS-R1 #8233

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions