Skip to content

Commit d482119

Browse files
committed
add launch doc
1 parent 00225f0 commit d482119

File tree

2 files changed

+134
-11
lines changed

2 files changed

+134
-11
lines changed

doc/source/locale/zh_CN/LC_MESSAGES/user_guide/launch.po

Lines changed: 98 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ msgid ""
88
msgstr ""
99
"Project-Id-Version: Xinference \n"
1010
"Report-Msgid-Bugs-To: \n"
11-
"POT-Creation-Date: 2025-08-02 23:15+0800\n"
11+
"POT-Creation-Date: 2025-10-20 16:28+0800\n"
1212
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
1313
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
1414
"Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
1717
"MIME-Version: 1.0\n"
1818
"Content-Type: text/plain; charset=utf-8\n"
1919
"Content-Transfer-Encoding: 8bit\n"
20-
"Generated-By: Babel 2.14.0\n"
20+
"Generated-By: Babel 2.17.0\n"
2121

2222
#: ../../source/user_guide/launch.rst:5
2323
msgid "Model Launching Instructions"
@@ -46,11 +46,86 @@ msgstr ""
4646
"两张 GPU 上。Xinference 会自动进行负载均衡,确保请求均匀分配到多张卡上。"
4747
"用户看到的仍是一个模型,这大大提升了整体资源利用率。"
4848

49-
#: ../../source/user_guide/launch.rst:18
49+
#: ../../source/user_guide/launch.rst:17
50+
msgid "Traditional Multi-Instance Deployment:"
51+
msgstr "旧版本多实例部署:"
52+
53+
#: ../../source/user_guide/launch.rst:19
54+
msgid ""
55+
"When you have multiple GPU cards, each capable of hosting one model "
56+
"instance, you can set the number of instances equal to the number of "
57+
"GPUs. For example:"
58+
msgstr "当您拥有多张GPU显卡时,每张显卡可承载一个模型实例,此时可将实例数量设置为等于GPU数量。例如:"
59+
60+
#: ../../source/user_guide/launch.rst:21
61+
msgid "2 GPUs, 2 instances: Each GPU runs one model instance"
62+
msgstr "2张GPU,2个实例:每张GPU运行一个模型实例"
63+
64+
#: ../../source/user_guide/launch.rst:22
65+
msgid "4 GPUs, 4 instances: Each GPU runs one model instance"
66+
msgstr "4张GPU,4个实例:每张GPU运行一个模型实例"
67+
68+
#: ../../source/user_guide/launch.rst:26
69+
msgid "Introduce a new environment variable:"
70+
msgstr "引入一个新的环境变量:"
71+
72+
#: ../../source/user_guide/launch.rst:32
73+
msgid ""
74+
"Control whether to enable the single GPU multi-copy feature Default "
75+
"value: 1"
76+
msgstr "控制是否启用单GPU多副本功能,默认值:1"
77+
78+
#: ../../source/user_guide/launch.rst:35
79+
msgid "New Feature: Smart Replica Deployment"
80+
msgstr "新功能:智能副本部署"
81+
82+
#: ../../source/user_guide/launch.rst:37
83+
msgid "Single GPU Multi-Replica"
84+
msgstr "单GPU多副本"
85+
86+
#: ../../source/user_guide/launch.rst:39
87+
msgid "New Support: Run multiple model replicas even with just one GPU."
88+
msgstr "新增支持:即使仅有一块GPU,也能运行多个模型副本。"
89+
90+
#: ../../source/user_guide/launch.rst:41
91+
msgid "Scenario: You have 1 GPU with sufficient VRAM"
92+
msgstr "场景:您拥有1个GPU且显存充足"
93+
94+
#: ../../source/user_guide/launch.rst:42
95+
msgid "Configuration: Replica Count = 3, GPU Count = 1"
96+
msgstr "配置:副本数量=3,GPU数量=1"
97+
98+
#: ../../source/user_guide/launch.rst:43
99+
msgid "Result: 3 model instances running on the same GPU, sharing GPU resources"
100+
msgstr "结果:3个模型实例,在同一GPU上运行,共享GPU资源"
101+
102+
#: ../../source/user_guide/launch.rst:45
103+
msgid "Hybrid GPU Allocation"
104+
msgstr "混合GPU分配"
105+
106+
#: ../../source/user_guide/launch.rst:47
107+
msgid ""
108+
"Smart Allocation: Number of replicas may differ from GPU count; system "
109+
"intelligently distributes"
110+
msgstr "智能分配: 副本数可以不等于GPU数量,系统会智能分配"
111+
112+
#: ../../source/user_guide/launch.rst:49
113+
msgid "Scenario: You have 2 GPUs and need 3 replicas"
114+
msgstr "场景: 你有2张GPU,需要3个副本"
115+
116+
#: ../../source/user_guide/launch.rst:50
117+
msgid "Configuration: Replicas=3, GPUs=2"
118+
msgstr "配置: 副本数=3,GPU数量=2"
119+
120+
#: ../../source/user_guide/launch.rst:51
121+
msgid "Result: GPU0 runs 2 instances, GPU1 runs 1 instance"
122+
msgstr "结果: GPU0运行2个实例,GPU1运行1个实例"
123+
124+
#: ../../source/user_guide/launch.rst:54
50125
msgid "Set Environment Variables"
51126
msgstr "设置环境变量"
52127

53-
#: ../../source/user_guide/launch.rst:22
128+
#: ../../source/user_guide/launch.rst:58
54129
msgid ""
55130
"Sometimes, we want to specify environment variables for a particular "
56131
"model at runtime. Since v1.8.1, Xinference provides the capability to "
@@ -60,21 +135,21 @@ msgstr ""
60135
"有时我们希望在运行时为特定模型指定环境变量。从 v1.8.1 开始,Xinference "
61136
"提供了单独配置环境变量的功能,无需在启动 Xinference 前设置。"
62137

63-
#: ../../source/user_guide/launch.rst:25
138+
#: ../../source/user_guide/launch.rst:61
64139
msgid "For Web UI."
65140
msgstr "针对 Web UI。"
66141

67-
#: ../../source/user_guide/launch.rst:31
142+
#: ../../source/user_guide/launch.rst:67
68143
msgid ""
69144
"When using the command line, use ``--env`` to specify an environment "
70145
"variable."
71146
msgstr "命令行使用时,使用 ``--env`` 指定环境变量。"
72147

73-
#: ../../source/user_guide/launch.rst:33
148+
#: ../../source/user_guide/launch.rst:69
74149
msgid "Example usage:"
75150
msgstr "示例用法:"
76151

77-
#: ../../source/user_guide/launch.rst:39
152+
#: ../../source/user_guide/launch.rst:75
78153
msgid ""
79154
"Take vLLM as an example: it has versions V1 and V0, and by default, it "
80155
"automatically determines which version to use. If you want to force the "
@@ -85,13 +160,25 @@ msgstr ""
85160
"在加载模型时强制通过设置 ``VLLM_USE_V1=0`` 来使用 V0,可以指定该环境变量"
86161
"。"
87162

88-
#: ../../source/user_guide/launch.rst:43
163+
#: ../../source/user_guide/launch.rst:79
89164
msgid "Configuring Model Virtual Environment"
90165
msgstr "配置模型虚拟空间"
91166

92-
#: ../../source/user_guide/launch.rst:47
167+
#: ../../source/user_guide/launch.rst:83
93168
msgid ""
94169
"For this part, please refer to :ref:`toggling virtual environments and "
95170
"customizing dependencies <model_launching_virtualenv>`."
96-
msgstr "对于这部分,请参考 :ref:`开关虚拟空间和定制依赖 <model_launching_virtualenv>`。"
171+
msgstr ""
172+
"对于这部分,请参考 :ref:`开关虚拟空间和定制依赖 <model_launching_"
173+
"virtualenv>`。"
174+
175+
#~ msgid ""
176+
#~ "Scenario: You have 2 GPUs and need"
177+
#~ " 3 replicas Configuration: Replicas=3, "
178+
#~ "GPUs=2 Result: GPU0 runs 2 instances,"
179+
#~ " GPU1 runs 1 instance"
180+
#~ msgstr ""
181+
#~ "场景: 你有2张GPU,需要3个副本"
182+
#~ "配置: 副本数=3,GPU数量=2结果:"
183+
#~ " GPU0运行2个实例,GPU1运行1个实例"
97184

doc/source/user_guide/launch.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,42 @@ you can set the replica count to 2. This way, two identical instances of the mod
1414
Xinference automatically load-balances requests to ensure even distribution across multiple GPUs.
1515
Meanwhile, users see it as a single model, which greatly improves overall resource utilization.
1616

17+
Traditional Multi-Instance Deployment:
18+
19+
When you have multiple GPU cards, each capable of hosting one model instance, you can set the number of instances equal to the number of GPUs. For example:
20+
21+
- 2 GPUs, 2 instances: Each GPU runs one model instance
22+
- 4 GPUs, 4 instances: Each GPU runs one model instance
23+
24+
.. versionadded:: v1.11.1
25+
26+
Introduce a new environment variable:
27+
28+
.. code-block:: bash
29+
30+
XINFERENCE_ENABLE_SINGLE_GPU_MULTI_REPLICA
31+
32+
Control whether to enable the single GPU multi-copy feature
33+
Default value: 1
34+
35+
New Feature: Smart Replica Deployment
36+
37+
1. Single GPU Multi-Replica
38+
39+
New Support: Run multiple model replicas even with just one GPU.
40+
41+
- Scenario: You have 1 GPU with sufficient VRAM
42+
- Configuration: Replica Count = 3, GPU Count = 1
43+
- Result: 3 model instances running on the same GPU, sharing GPU resources
44+
45+
2. Hybrid GPU Allocation
46+
47+
Smart Allocation: Number of replicas may differ from GPU count; system intelligently distributes
48+
49+
- Scenario: You have 2 GPUs and need 3 replicas
50+
- Configuration: Replicas=3, GPUs=2
51+
- Result: GPU0 runs 2 instances, GPU1 runs 1 instance
52+
1753
Set Environment Variables
1854
=========================
1955

0 commit comments

Comments
 (0)