Add FP8 Support for Llama-4-Maverick on B200 (Feature Request)

Please add native FP8 inference support in TensorRT-LLM for the Llama-4-Maverick-17B-128E-Instruct model, with a focus on maximizing performance on NVIDIA B200 GPUs.

Model on HF : meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8