update

qscqesze · qscqesze · commit 649d291f7b71 · 2025-06-26T11:46:12.000+08:00
Signed-off-by: qingjun &lt;qingjun@minimaxi.com&gt;
diff --git a/_posts/2025-06-26-minimax-m1.md b/_posts/2025-06-26-minimax-m1.md
@@ -13,7 +13,7 @@ This article explores how MiniMax-M1's hybrid architecture is efficiently suppor
 
 ## Introduction
 
-The rapid advancement of artificial intelligence has led to the emergence of increasingly powerful large language models (LLMs). [MiniMax-M1](https://arxiv.org/pdf/2506.13585), the world's first open-source large-scale mixture-of-experts (MoE) inference model, has attracted significant attention since its release. Its innovative hybrid architecture points to the future of LLMs, enabling breakthroughs in long-context reasoning and complex task processing. Meanwhile, vLLM, a high-performance LLM inference and serving library, provides robust support for MiniMax-M1, making efficient deployment possible.
+The rapid advancement of artificial intelligence has led to the emergence of increasingly powerful large language models (LLMs). [MiniMax-M1](https://arxiv.org/pdf/2506.13585), a popular open-source large-scale mixture-of-experts (MoE) inference model, has attracted significant attention since its release. Its innovative hybrid architecture points to the future of LLMs, enabling breakthroughs in long-context reasoning and complex task processing. Meanwhile, vLLM, a high-performance LLM inference and serving library, provides robust support for MiniMax-M1, making efficient deployment possible.
 
 <img align="center" src="/assets/figures/minimax-m1/benchmark.png" alt="MiniMax-M1 Benchmark Performance" width="90%" height="90%">
 
@@ -61,7 +61,18 @@ sudo docker run -it \
     -v $MODEL_DIR:$MODEL_DIR \    
     --name $NAME \                
     $DOCKER_RUN_CMD \          
-    $IMAGE /bin/bash                 
+    $IMAGE /bin/bash    
+
+# Launch MiniMax-M1 Service
+export SAFETENSORS_FAST_GPU=1
+export VLLM_USE_V1=0
+python3 -m vllm.entrypoints.openai.api_server \
+--model <model storage path> \
+--tensor-parallel-size 8 \
+--trust-remote-code \
+--quantization experts_int8  \
+--max_model_len 4096 \
+--dtype bfloat16
 ```
 
 ## MiniMax-M1 Hybrid Architecture Highlights