Skip to content

Commit 2192bb3

Browse files
author
Claude Code
committed
remove LOADWORKER
1 parent e870caa commit 2192bb3

File tree

17 files changed

+35
-41
lines changed

17 files changed

+35
-41
lines changed

docs/CN/source/tutorial/deepseek_deployment.rst

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,12 @@ LightLLM 支持以下几种部署模式:
3030
.. code-block:: bash
3131
3232
# H200 单机 DeepSeek-R1 TP 模式
33-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
33+
python -m lightllm.server.api_server --port 8088 \
3434
--model_dir /path/DeepSeek-R1 \
3535
--tp 8 \
3636
--enable_fa3
3737
3838
**参数说明:**
39-
- `LOADWORKER=18`: 模型加载线程数,提高加载速度
4039
- `--tp 8`: 张量并行度,使用8个GPU
4140
- `--enable_fa3`: 启用 Flash Attention 3.0
4241
- `--port 8088`: 服务端口
@@ -51,7 +50,7 @@ LightLLM 支持以下几种部署模式:
5150
.. code-block:: bash
5251
5352
# H200 单机 DeepSeek-R1 DP + EP 模式
54-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
53+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
5554
--model_dir /path/DeepSeek-R1 \
5655
--tp 8 \
5756
--dp 8 \
@@ -82,7 +81,7 @@ LightLLM 支持以下几种部署模式:
8281
# H200/H100 多机 DeepSeek-R1 TP 模式 Node 0
8382
# 使用方法: sh multi_node_tp_node0.sh <nccl_host>
8483
export nccl_host=$1
85-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
84+
python -m lightllm.server.api_server --port 8088 \
8685
--model_dir /path/DeepSeek-R1 \
8786
--tp 16 \
8887
--enable_fa3 \
@@ -98,7 +97,7 @@ LightLLM 支持以下几种部署模式:
9897
# H200/H100 多机 DeepSeek-R1 TP 模式 Node 1
9998
# 使用方法: sh multi_node_tp_node1.sh <nccl_host>
10099
export nccl_host=$1
101-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
100+
python -m lightllm.server.api_server --port 8088 \
102101
--model_dir /path/DeepSeek-R1 \
103102
--tp 16 \
104103
--enable_fa3 \
@@ -125,7 +124,7 @@ LightLLM 支持以下几种部署模式:
125124
# H200 多机 DeepSeek-R1 EP 模式 Node 0
126125
# 使用方法: sh multi_node_ep_node0.sh <nccl_host>
127126
export nccl_host=$1
128-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
127+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
129128
--model_dir /path/DeepSeek-R1 \
130129
--tp 16 \
131130
--dp 16 \
@@ -142,7 +141,7 @@ LightLLM 支持以下几种部署模式:
142141
# H200 多机 DeepSeek-R1 EP 模式 Node 1
143142
# 使用方法: sh multi_node_ep_node1.sh <nccl_host>
144143
export nccl_host=$1
145-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
144+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
146145
--model_dir /path/DeepSeek-R1 \
147146
--tp 16 \
148147
--dp 16 \
@@ -187,7 +186,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
187186
export host=$1
188187
export pd_master_ip=$2
189188
nvidia-cuda-mps-control -d
190-
MOE_MODE=EP KV_TRANS_USE_P2P=1 LOADWORKER=18 python -m lightllm.server.api_server \
189+
MOE_MODE=EP KV_TRANS_USE_P2P=1 python -m lightllm.server.api_server \
191190
--model_dir /path/DeepSeek-R1 \
192191
--run_mode "prefill" \
193192
--tp 8 \
@@ -211,7 +210,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
211210
export host=$1
212211
export pd_master_ip=$2
213212
nvidia-cuda-mps-control -d
214-
MOE_MODE=EP KV_TRANS_USE_P2P=1 LOADWORKER=18 python -m lightllm.server.api_server \
213+
MOE_MODE=EP KV_TRANS_USE_P2P=1 python -m lightllm.server.api_server \
215214
--model_dir /path/DeepSeek-R1 \
216215
--run_mode "decode" \
217216
--tp 8 \
@@ -279,7 +278,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
279278
export host=$1
280279
export config_server_host=$2
281280
nvidia-cuda-mps-control -d
282-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server \
281+
MOE_MODE=EP python -m lightllm.server.api_server \
283282
--model_dir /path/DeepSeek-R1 \
284283
--run_mode "prefill" \
285284
--host $host \
@@ -298,7 +297,7 @@ PD (Prefill-Decode) 分离模式将预填充和解码阶段分离部署,可以
298297
export host=$1
299298
export config_server_host=$2
300299
nvidia-cuda-mps-control -d
301-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server \
300+
MOE_MODE=EP python -m lightllm.server.api_server \
302301
--model_dir /path/DeepSeek-R1 \
303302
--run_mode "decode" \
304303
--host $host \

docs/CN/source/tutorial/multimodal.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ LightLLM支持多种多模态模型的推理,下面以InternVL为例,对多
99
.. code-block:: bash
1010
1111
INTERNVL_IMAGE_LENGTH=256 \
12-
LOADWORKER=12 \
1312
python -m lightllm.server.api_server \
1413
--port 8080 \
1514
--tp 2 \
@@ -25,7 +24,6 @@ LightLLM支持多种多模态模型的推理,下面以InternVL为例,对多
2524
^^^^^^^^
2625

2726
- **INTERNVL_IMAGE_LENGTH**: 设置InternVL模型的图像token长度,默认为256
28-
- **LOADWORKER**: 设置模型加载的工作进程数
2927

3028
基础服务参数
3129
^^^^^^^^^^^

docs/EN/source/tutorial/deepseek_deployment.rst

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,12 @@ Suitable for deploying DeepSeek-R1 model on a single H200 node.
3030
.. code-block:: bash
3131
3232
# H200 Single node DeepSeek-R1 TP Mode
33-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
33+
python -m lightllm.server.api_server --port 8088 \
3434
--model_dir /path/DeepSeek-R1 \
3535
--tp 8 \
3636
--enable_fa3
3737
3838
**Parameter Description:**
39-
- `LOADWORKER=18`: Model loading thread count, improves loading speed
4039
- `--tp 8`: Tensor parallelism, using 8 GPUs
4140
- `--enable_fa3`: Enable Flash Attention 3.0
4241
- `--port 8088`: Service port
@@ -51,7 +50,7 @@ Suitable for expert parallelism deployment of MoE models like DeepSeek-V2/V3.
5150
.. code-block:: bash
5251
5352
# H200 Single node DeepSeek-R1 DP + EP Mode
54-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
53+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
5554
--model_dir /path/DeepSeek-R1 \
5655
--tp 8 \
5756
--dp 8 \
@@ -82,7 +81,7 @@ Suitable for deployment across multiple H200/H100 nodes.
8281
# H200/H100 Multi-node DeepSeek-R1 TP Mode Node 0
8382
# Usage: sh multi_node_tp_node0.sh <nccl_host>
8483
export nccl_host=$1
85-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
84+
python -m lightllm.server.api_server --port 8088 \
8685
--model_dir /path/DeepSeek-R1 \
8786
--tp 16 \
8887
--enable_fa3 \
@@ -98,7 +97,7 @@ Suitable for deployment across multiple H200/H100 nodes.
9897
# H200/H100 Multi-node DeepSeek-R1 TP Mode Node 1
9998
# Usage: sh multi_node_tp_node1.sh <nccl_host>
10099
export nccl_host=$1
101-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
100+
python -m lightllm.server.api_server --port 8088 \
102101
--model_dir /path/DeepSeek-R1 \
103102
--tp 16 \
104103
--enable_fa3 \
@@ -125,7 +124,7 @@ Suitable for deploying MoE models across multiple nodes.
125124
# H200 Multi-node DeepSeek-R1 EP Mode Node 0
126125
# Usage: sh multi_node_ep_node0.sh <nccl_host>
127126
export nccl_host=$1
128-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
127+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
129128
--model_dir /path/DeepSeek-R1 \
130129
--tp 16 \
131130
--dp 16 \
@@ -142,7 +141,7 @@ Suitable for deploying MoE models across multiple nodes.
142141
# H200 Multi-node DeepSeek-R1 EP Mode Node 1
143142
# Usage: sh multi_node_ep_node1.sh <nccl_host>
144143
export nccl_host=$1
145-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
144+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
146145
--model_dir /path/DeepSeek-R1 \
147146
--tp 16 \
148147
--dp 16 \
@@ -187,7 +186,7 @@ PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for
187186
export host=$1
188187
export pd_master_ip=$2
189188
nvidia-cuda-mps-control -d
190-
MOE_MODE=EP KV_TRANS_USE_P2P=1 LOADWORKER=18 python -m lightllm.server.api_server \
189+
MOE_MODE=EP KV_TRANS_USE_P2P=1 python -m lightllm.server.api_server \
191190
--model_dir /path/DeepSeek-R1 \
192191
--run_mode "prefill" \
193192
--tp 8 \
@@ -197,7 +196,7 @@ PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for
197196
--nccl_port 2732 \
198197
--enable_fa3 \
199198
--disable_cudagraph \
200-
--pd_master_ip $pd_master_ip
199+
--pd_master_ip $pd_master_ip
201200
202201
**Step 3: Launch Decode Service**
203202

@@ -208,7 +207,7 @@ PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for
208207
export host=$1
209208
export pd_master_ip=$2
210209
nvidia-cuda-mps-control -d
211-
MOE_MODE=EP KV_TRANS_USE_P2P=1 LOADWORKER=18 python -m lightllm.server.api_server \
210+
MOE_MODE=EP KV_TRANS_USE_P2P=1 python -m lightllm.server.api_server \
212211
--model_dir /path/DeepSeek-R1 \
213212
--run_mode "decode" \
214213
--tp 8 \
@@ -276,7 +275,7 @@ Supports multiple PD Master nodes, providing better load balancing and high avai
276275
export host=$1
277276
export config_server_host=$2
278277
nvidia-cuda-mps-control -d
279-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server \
278+
MOE_MODE=EP python -m lightllm.server.api_server \
280279
--model_dir /path/DeepSeek-R1 \
281280
--run_mode "prefill" \
282281
--host $host \
@@ -295,7 +294,7 @@ Supports multiple PD Master nodes, providing better load balancing and high avai
295294
export host=$1
296295
export config_server_host=$2
297296
nvidia-cuda-mps-control -d
298-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server \
297+
MOE_MODE=EP python -m lightllm.server.api_server \
299298
--model_dir /path/DeepSeek-R1 \
300299
--run_mode "decode" \
301300
--host $host \

docs/EN/source/tutorial/multimodal.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ Basic Launch Command
99
.. code-block:: bash
1010
1111
INTERNVL_IMAGE_LENGTH=256 \
12-
LOADWORKER=12 \
1312
python -m lightllm.server.api_server \
1413
--port 8080 \
1514
--tp 2 \
@@ -25,7 +24,6 @@ Environment Variables
2524
^^^^^^^^^^^^^^^^^^^^
2625

2726
- **INTERNVL_IMAGE_LENGTH**: Set the image token length for InternVL model, default is 256
28-
- **LOADWORKER**: Set the number of worker processes for model loading
2927

3028
Basic Service Parameters
3129
^^^^^^^^^^^^^^^^^^^^^^^

lightllm/common/basemodel/layer_weights/hf_load_utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ def load_hf_weights(data_type, weight_dir, pre_post_layer=None, transformer_laye
5151
candidate_files = list(filter(lambda x: x.endswith(".bin"), files))
5252
assert len(candidate_files) != 0, "can only support pytorch tensor and safetensors format for weights."
5353
from functools import partial
54+
from multiprocessing import cpu_count
5455
from multiprocessing.pool import ThreadPool as Pool
5556

5657
partial_func = partial(
@@ -60,7 +61,7 @@ def load_hf_weights(data_type, weight_dir, pre_post_layer=None, transformer_laye
6061
transformer_layer_list=transformer_layer_list,
6162
weight_dir=weight_dir,
6263
) # noqa
63-
worker = int(os.environ.get("LOADWORKER", 24))
64+
worker = min(24, cpu_count())
6465
with Pool(worker) as p:
6566
iterator = p.imap_unordered(partial_func, candidate_files, chunksize=1)
6667
desc_str = f"pid {os.getpid()} Loading model weights with {worker} workers"

test/start_scripts/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,6 @@ sh multi_pd_master/pd_decode.sh <host> <config_server_host>
9898

9999
### Environment Variables
100100

101-
- `LOADWORKER`: Model loading thread count, recommended 8-18
102101
- `MOE_MODE`: Expert parallelism mode, set to EP to enable expert parallelism
103102
- `KV_TRANS_USE_P2P`: Enable P2P communication optimization
104103
- `CUDA_VISIBLE_DEVICES`: Specify GPU devices to use

test/start_scripts/multi_node_ep_node0.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# nccl_host: the ip of the nccl host
33
# sh multi_node_ep_node0.sh <nccl_host>
44
export nccl_host=$1
5-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
5+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
66
--model_dir /path/DeepSeek-R1 \
77
--tp 16 \
88
--dp 16 \

test/start_scripts/multi_node_ep_node1.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# nccl_host: the ip of the nccl host
33
# sh multi_node_ep_node1.sh <nccl_host>
44
export nccl_host=$1
5-
MOE_MODE=EP LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
5+
MOE_MODE=EP python -m lightllm.server.api_server --port 8088 \
66
--model_dir /path/DeepSeek-R1 \
77
--tp 16 \
88
--dp 16 \

test/start_scripts/multi_node_tp_node0.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# nccl_host: the ip of the nccl host
33
# sh multi_node_tp_node0.sh <nccl_host>
44
export nccl_host=$1
5-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
5+
python -m lightllm.server.api_server --port 8088 \
66
--model_dir /path/DeepSeek-R1 \
77
--tp 16 \
88
--enable_fa3 \

test/start_scripts/multi_node_tp_node1.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# nccl_host: the ip of the nccl host
33
# sh multi_node_tp_node1.sh <nccl_host>
44
export nccl_host=$1
5-
LOADWORKER=18 python -m lightllm.server.api_server --port 8088 \
5+
python -m lightllm.server.api_server --port 8088 \
66
--model_dir /path/DeepSeek-R1 \
77
--tp 16 \
88
--enable_fa3 \

0 commit comments

Comments
 (0)