Skip to content

Commit a3fcacb

Browse files
author
niushengxiao
committed
fix doc
1 parent 7fb6b43 commit a3fcacb

File tree

7 files changed

+43
-43
lines changed

7 files changed

+43
-43
lines changed

docs/EN/source/getting_started/benchmark.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Benchmark Testing Guide
44
LightLLM provides multiple performance testing tools, including service performance testing and static inference performance testing. This document will detailedly introduce how to use these tools for performance evaluation.
55

66
Service Performance Testing (Service Benchmark)
7-
----------------------------------------------
7+
-----------------------------------------------
88

99
Service performance testing is mainly used to evaluate LightLLM's performance in real service scenarios, including key metrics such as throughput and latency.
1010

@@ -55,7 +55,7 @@ QPS (Queries Per Second) testing is the core tool for evaluating service perform
5555
- decode_token_time P{25,50,75,90,95,99,100}: Decode token latency percentiles
5656

5757
Fixed Concurrency Testing (benchmark_client.py)
58-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5959

6060
Used to evaluate performance under different client concurrency levels.
6161

@@ -73,7 +73,7 @@ Used to evaluate performance under different client concurrency levels.
7373
--server_api lightllm
7474
7575
ShareGPT Dataset Testing (benchmark_sharegpt.py)
76-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7777

7878
Performance testing using ShareGPT real conversation data.
7979

@@ -99,7 +99,7 @@ Performance testing using ShareGPT real conversation data.
9999
- ``--request_rate``: Request rate (requests/s)
100100

101101
Prompt Cache Testing
102-
~~~~~~~~~~~~~~~~~~~
102+
~~~~~~~~~~~~~~~~~~~~
103103

104104
Evaluate prompt cache performance under different hit rates by adjusting --first_input_len, --output_len --subsequent_input_len to control hit rate.
105105
Hit rate per round = (first_input_len + (output_len + subsequent_input_len) * (num_turns - 1)) / (first_input_len + (output_len + subsequent_input_len) * num_turns)
@@ -129,12 +129,12 @@ Parameter Description:
129129
- ``--num_users``: Number of users
130130

131131
Static Inference Performance Testing (Static Inference Benchmark)
132-
---------------------------------------------------------------
132+
------------------------------------------------------------------
133133

134134
Static inference testing is used to evaluate model inference performance under fixed input conditions, mainly evaluating operator quality.
135135

136136
Model Inference Testing (model_infer.py)
137-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
137+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138138

139139
**Main Features:**
140140

docs/EN/source/tutorial/api_param.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ API Call Details
44
:code:`GET /health`
55
~~~~~~~~~~~~~~~~~~~
66
:code:`HEAD /health`
7-
~~~~~~~~~~~~~~~~~~~
7+
~~~~~~~~~~~~~~~~~~~~
88
:code:`GET /healthz`
9-
~~~~~~~~~~~~~~~~~~~
9+
~~~~~~~~~~~~~~~~~~~~
1010

1111
Get the current server running status
1212

@@ -23,7 +23,7 @@ Get the current server running status
2323
{"message":"Ok"}
2424
2525
:code:`GET /token_load`
26-
~~~~~~~~~~~~~~~~~~~~~~
26+
~~~~~~~~~~~~~~~~~~~~~~~
2727

2828
Get the current server token usage status
2929

docs/EN/source/tutorial/api_server_args_zh.rst

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
APIServer Parameter Details
2-
==========================
2+
===========================
33

44
This document provides detailed information about all startup parameters and their usage for LightLLM APIServer.
55

66
Basic Configuration Parameters
7-
-----------------------------
7+
------------------------------
88

99
.. option:: --run_mode
1010

@@ -38,7 +38,7 @@ Basic Configuration Parameters
3838
Can only choose from ``['tcp://', 'ipc:///tmp/']``
3939

4040
PD disaggregation Mode Parameters
41-
----------------------------
41+
----------------------------------
4242

4343
.. option:: --pd_master_ip
4444

@@ -65,7 +65,7 @@ PD disaggregation Mode Parameters
6565
Port number in configuration server mode
6666

6767
Model Configuration Parameters
68-
-----------------------------
68+
------------------------------
6969

7070
.. option:: --model_name
7171

@@ -96,7 +96,7 @@ Model Configuration Parameters
9696
Whether to allow using custom model definition files on Hub
9797

9898
Memory and Batch Processing Parameters
99-
------------------------------------
99+
---------------------------------------
100100

101101
.. option:: --max_total_token_num
102102

@@ -135,7 +135,7 @@ Memory and Batch Processing Parameters
135135
* ``mistral``
136136

137137
Different Parallel Mode Setting Parameters
138-
----------------------------------------
138+
-------------------------------------------
139139

140140
.. option:: --nnodes
141141

@@ -182,7 +182,7 @@ Different Parallel Mode Setting Parameters
182182
do not use the same nccl_port for different inference nodes, this will be a serious error
183183

184184
Attention Type Selection Parameters
185-
---------------------------------
185+
------------------------------------
186186

187187
.. option:: --mode
188188

@@ -199,7 +199,7 @@ Attention Type Selection Parameters
199199
Need to read source code to confirm specific modes supported by all models
200200

201201
Scheduling Parameters
202-
--------------------
202+
---------------------
203203

204204
.. option:: --router_token_ratio
205205

@@ -241,7 +241,7 @@ Scheduling Parameters
241241
Schedule time interval, default is ``0.03``, unit is seconds
242242

243243
Output Constraint Parameters
244-
---------------------------
244+
----------------------------
245245

246246
.. option:: --token_healing_mode
247247

@@ -259,7 +259,7 @@ Output Constraint Parameters
259259
Use environment variable FIRST_ALLOWED_TOKENS to set the range, e.g., FIRST_ALLOWED_TOKENS=1,2
260260

261261
Multimodal Parameters
262-
--------------------
262+
---------------------
263263

264264
.. option:: --enable_multimodal
265265

@@ -298,7 +298,7 @@ Multimodal Parameters
298298
List of NCCL ports for ViT, e.g., 29500 29501 29502, default is [29500]
299299

300300
Performance Optimization Parameters
301-
----------------------------------
301+
-----------------------------------
302302

303303
.. option:: --disable_custom_allreduce
304304

@@ -363,7 +363,7 @@ Performance Optimization Parameters
363363
Maximum sequence length that can be captured by cuda graph in the decoding phase, default is ``max_req_total_len``
364364

365365
Quantization Parameters
366-
----------------------
366+
-----------------------
367367

368368
.. option:: --quant_type
369369

@@ -408,7 +408,7 @@ Quantization Parameters
408408
Examples can be found in lightllm/common/quantization/configs.
409409

410410
Sampling and Generation Parameters
411-
--------------------------------
411+
----------------------------------
412412

413413
.. option:: --sampling_backend
414414

@@ -438,7 +438,7 @@ Sampling and Generation Parameters
438438
Use tgi input and output format
439439

440440
MTP Multi-Prediction Parameters
441-
------------------------------
441+
-------------------------------
442442

443443
.. option:: --mtp_mode
444444

@@ -463,7 +463,7 @@ MTP Multi-Prediction Parameters
463463
Currently deepseekv3/r1 models only support 1 step
464464

465465
DeepSeek Redundant Expert Parameters
466-
-----------------------------------
466+
------------------------------------
467467

468468
.. option:: --ep_redundancy_expert_config_path
469469

@@ -474,7 +474,7 @@ DeepSeek Redundant Expert Parameters
474474
Whether to update redundant experts for deepseekv3 models through online expert usage counters.
475475

476476
Monitoring and Logging Parameters
477-
--------------------------------
477+
---------------------------------
478478

479479
.. option:: --disable_log_stats
480480

docs/EN/source/tutorial/deepseek_deployment.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ DeepSeek Model Deployment Guide
66
LightLLM supports various deployment solutions for DeepSeek models, including DeepSeek-R1, DeepSeek-V2, DeepSeek-V3, etc. This document provides detailed information on various deployment modes and configuration solutions.
77

88
Deployment Mode Overview
9-
-----------------------
9+
------------------------
1010

1111
LightLLM supports the following deployment modes:
1212

@@ -157,7 +157,7 @@ Suitable for deploying MoE models across multiple nodes.
157157
- `--enable_decode_microbatch_overlap`: Enable decode microbatch overlap
158158

159159
3. PD disaggregation Deployment Solutions
160-
------------------------------------
160+
------------------------------------------
161161

162162
PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for deployment, which can better utilize hardware resources.
163163

@@ -328,7 +328,7 @@ Supports multiple PD Master nodes, providing better load balancing and high avai
328328
}'
329329
330330
4.2 Performance Benchmark Testing
331-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
331+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332332

333333
.. code-block:: bash
334334

docs/EN/source/tutorial/multimodal.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
Multimodal Model Launch Configuration
2-
====================================
2+
=====================================
33

44
LightLLM supports inference for various multimodal models. Below, using InternVL as an example, we explain the launch commands for multimodal services.
55

66
Basic Launch Command
7-
-------------------
7+
--------------------
88

99
.. code-block:: bash
1010
@@ -19,16 +19,16 @@ Basic Launch Command
1919
--enable_multimodal
2020
2121
Core Parameter Description
22-
-------------------------
22+
--------------------------
2323

2424
Environment Variables
25-
^^^^^^^^^^^^^^^^^^^^
25+
^^^^^^^^^^^^^^^^^^^^^
2626

2727
- **INTERNVL_IMAGE_LENGTH**: Set the image token length for InternVL model, default is 256
2828
- **LOADWORKER**: Set the number of worker processes for model loading
2929

3030
Basic Service Parameters
31-
^^^^^^^^^^^^^^^^^^^^^^^
31+
^^^^^^^^^^^^^^^^^^^^^^^^
3232

3333
- **--port 8080**: API server listening port
3434
- **--tp 2**: Tensor parallelism degree
@@ -38,7 +38,7 @@ Basic Service Parameters
3838
- **--enable_multimodal**: Enable multimodal functionality
3939

4040
Advanced Configuration Parameters
41-
--------------------------------
41+
---------------------------------
4242

4343
.. code-block:: bash
4444
@@ -58,20 +58,20 @@ ViT Deployment Methods
5858
----------------------
5959

6060
ViT TP (Tensor Parallel)
61-
^^^^^^^^^^^^^^^^^^^^^^^
61+
^^^^^^^^^^^^^^^^^^^^^^^^^
6262

6363
- Default usage
6464
- --visual_tp tp_size enables tensor parallelism
6565

6666
ViT DP (Data Parallel)
67-
^^^^^^^^^^^^^^^^^^^^^
67+
^^^^^^^^^^^^^^^^^^^^^^^
6868

6969
- Distribute different image batches to multiple GPUs
7070
- Each GPU runs a complete ViT model copy
7171
- --visual_dp dp_size enables data parallelism
7272

7373
Image Caching Mechanism
74-
----------------------
74+
-----------------------
7575
LightLLM caches embeddings of input images. In multi-turn conversations, if the images are the same, cached embeddings can be used directly, avoiding repeated inference.
7676

7777
- **--cache_capacity**: Controls the number of cached image embeds

docs/EN/source/tutorial/openai.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ LightLLM OpenAI API Usage Examples
66
LightLLM provides an interface that is fully compatible with OpenAI API, supporting all standard OpenAI features including function calling. This document provides detailed information on how to use LightLLM's OpenAI interface.
77

88
Basic Configuration
9-
------------------
9+
-------------------
1010

1111
First, ensure that the LightLLM service is started:
1212

@@ -19,7 +19,7 @@ First, ensure that the LightLLM service is started:
1919
--tp 1
2020
2121
Basic Conversation Examples
22-
--------------------------
22+
---------------------------
2323

2424
1. Simple Conversation
2525
~~~~~~~~~~~~~~~~~~~~~~
@@ -94,7 +94,7 @@ Basic Conversation Examples
9494
print("Error:", response.status_code, response.text)
9595
9696
Function Calling Examples
97-
------------------------
97+
-------------------------
9898

9999
LightLLM supports OpenAI's function calling functionality, providing function call parsing for three models. Specify the --tool_call_parser parameter when starting the service to choose. The service launch command is:
100100

docs/EN/source/tutorial/reward_model.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Reward Model Deployment Configuration
2-
====================================
2+
=====================================
33

44
LightLLM supports inference for various reward models, used for evaluating conversation quality and generating reward scores. Currently supported reward models include InternLM2 Reward and Qwen2 Reward, etc.
55

@@ -18,7 +18,7 @@ Testing Examples
1818
----------------
1919

2020
Python Testing Code
21-
^^^^^^^^^^^^^^^^^^
21+
^^^^^^^^^^^^^^^^^^^
2222

2323
.. code-block:: python
2424
@@ -48,7 +48,7 @@ Python Testing Code
4848
print(f"Error: {response.status_code}, {response.text}")
4949
5050
cURL Testing Command
51-
^^^^^^^^^^^^^^^^^^^
51+
^^^^^^^^^^^^^^^^^^^^
5252

5353
.. code-block:: bash
5454

0 commit comments

Comments
 (0)