Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs — ROCm Blogs #209
Replies: 3 comments 1 reply
-
|
Insights on scaling LLM inference with vLLM on AMD Instinct MI355X GPUs are impressive and highly practical for AI deployments. The detailed benchmarks comparing MI355X with NVIDIA B200 across models like DeepSeek-R1, GPT-OSS-120B, Qwen3-235B, and Llama-3.3-70B clearly highlight AMD’s advantage at mid-to-high concurrency levels. The explanation of optimizations such as AITER kernels, MoE/MLA fusion, and QuickReduce effectively demonstrates how hardware and software synergy boosts throughput. For detailed technical guides and performance tips, https://verhistoriasig.com.mx/ This post is a valuable resource for teams seeking scalable, high-performance, and cost-efficient LLM inference in production environments. |
Beta Was this translation helpful? Give feedback.
-
|
hi! Could you clarify which model is actually used for testing Everywhere in the article Llama-3.3-70B is mentioned. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Team, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Scaling AI Inference Performance with vLLM on AMD Instinct MI355X GPUs — ROCm Blogs
Explore how MI355X performs against B200 in vLLM benchmarks across DeepSeek-R1, GPT-OSS-120B, Qwen3-235B and Llama-3.3-70B.
https://rocm.blogs.amd.com/artificial-intelligence/scaling-ai-inference/README.html
Beta Was this translation helpful? Give feedback.
All reactions