Skip to content

Commit 649d291

Browse files
committed
update
Signed-off-by: qingjun <[email protected]>
1 parent 2d4e63e commit 649d291

File tree

1 file changed

+13
-2
lines changed

1 file changed

+13
-2
lines changed

_posts/2025-06-26-minimax-m1.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This article explores how MiniMax-M1's hybrid architecture is efficiently suppor
1313

1414
## Introduction
1515

16-
The rapid advancement of artificial intelligence has led to the emergence of increasingly powerful large language models (LLMs). [MiniMax-M1](https://arxiv.org/pdf/2506.13585), the world's first open-source large-scale mixture-of-experts (MoE) inference model, has attracted significant attention since its release. Its innovative hybrid architecture points to the future of LLMs, enabling breakthroughs in long-context reasoning and complex task processing. Meanwhile, vLLM, a high-performance LLM inference and serving library, provides robust support for MiniMax-M1, making efficient deployment possible.
16+
The rapid advancement of artificial intelligence has led to the emergence of increasingly powerful large language models (LLMs). [MiniMax-M1](https://arxiv.org/pdf/2506.13585), a popular open-source large-scale mixture-of-experts (MoE) inference model, has attracted significant attention since its release. Its innovative hybrid architecture points to the future of LLMs, enabling breakthroughs in long-context reasoning and complex task processing. Meanwhile, vLLM, a high-performance LLM inference and serving library, provides robust support for MiniMax-M1, making efficient deployment possible.
1717

1818
<img align="center" src="/assets/figures/minimax-m1/benchmark.png" alt="MiniMax-M1 Benchmark Performance" width="90%" height="90%">
1919

@@ -61,7 +61,18 @@ sudo docker run -it \
6161
-v $MODEL_DIR:$MODEL_DIR \
6262
--name $NAME \
6363
$DOCKER_RUN_CMD \
64-
$IMAGE /bin/bash
64+
$IMAGE /bin/bash
65+
66+
# Launch MiniMax-M1 Service
67+
export SAFETENSORS_FAST_GPU=1
68+
export VLLM_USE_V1=0
69+
python3 -m vllm.entrypoints.openai.api_server \
70+
--model <model storage path> \
71+
--tensor-parallel-size 8 \
72+
--trust-remote-code \
73+
--quantization experts_int8 \
74+
--max_model_len 4096 \
75+
--dtype bfloat16
6576
```
6677

6778
## MiniMax-M1 Hybrid Architecture Highlights

0 commit comments

Comments
 (0)