You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please add native FP8 inference support in TensorRT-LLM for the Llama-4-Maverick-17B-128E-Instruct model, with a focus on maximizing performance on NVIDIA B200 GPUs.
Model on HF : meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8