Skip to content

[AutoDeploy]: multi token prediction (MTP) for DS-R1 #8237

@nzmora-nvidia

Description

@nzmora-nvidia

🚀 The feature, motivation and pitch

https://github.com/NVIDIA/TensorRT-LLM/tree/c4abca323e2662138fa3de47e22e78709e4d3b6e/examples/models/core/deepseek_v3#hardware-requirements

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendSpeculative Decoding<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions