Skip to content

Add FP8 Support for Llama-4-Maverick on B200 (Feature Request) #4033

@indianspeedster

Description

@indianspeedster

Please add native FP8 inference support in TensorRT-LLM for the Llama-4-Maverick-17B-128E-Instruct model, with a focus on maximizing performance on NVIDIA B200 GPUs.

Model on HF : meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions