fix doc

niushengxiao · niushengxiao · commit a3fcacb7c67a · 2025-12-08T10:15:00.000+08:00
diff --git a/docs/EN/source/getting_started/benchmark.rst b/docs/EN/source/getting_started/benchmark.rst
@@ -4,7 +4,7 @@ Benchmark Testing Guide
 LightLLM provides multiple performance testing tools, including service performance testing and static inference performance testing. This document will detailedly introduce how to use these tools for performance evaluation.
 
 Service Performance Testing (Service Benchmark)
-----------------------------------------------
+-----------------------------------------------
 
 Service performance testing is mainly used to evaluate LightLLM's performance in real service scenarios, including key metrics such as throughput and latency.
 
@@ -55,7 +55,7 @@ QPS (Queries Per Second) testing is the core tool for evaluating service perform
 - decode_token_time P{25,50,75,90,95,99,100}: Decode token latency percentiles
 
 Fixed Concurrency Testing (benchmark_client.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Used to evaluate performance under different client concurrency levels.
 
@@ -73,7 +73,7 @@ Used to evaluate performance under different client concurrency levels.
         --server_api lightllm
 
 ShareGPT Dataset Testing (benchmark_sharegpt.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Performance testing using ShareGPT real conversation data.
 
@@ -99,7 +99,7 @@ Performance testing using ShareGPT real conversation data.
 - ``--request_rate``: Request rate (requests/s)
 
 Prompt Cache Testing
-~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
 
 Evaluate prompt cache performance under different hit rates by adjusting --first_input_len, --output_len --subsequent_input_len to control hit rate.
 Hit rate per round = (first_input_len + (output_len + subsequent_input_len) * (num_turns - 1)) / (first_input_len + (output_len + subsequent_input_len) * num_turns)
@@ -129,12 +129,12 @@ Parameter Description:
 - ``--num_users``: Number of users
 
 Static Inference Performance Testing (Static Inference Benchmark)
----------------------------------------------------------------
+------------------------------------------------------------------
 
 Static inference testing is used to evaluate model inference performance under fixed input conditions, mainly evaluating operator quality.
 
 Model Inference Testing (model_infer.py)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 **Main Features:**
 
diff --git a/docs/EN/source/tutorial/api_param.rst b/docs/EN/source/tutorial/api_param.rst
@@ -4,9 +4,9 @@ API Call Details
 :code:`GET /health`
 ~~~~~~~~~~~~~~~~~~~
 :code:`HEAD /health`
-~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
 :code:`GET /healthz`
-~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~
 
 Get the current server running status
 
@@ -23,7 +23,7 @@ Get the current server running status
     {"message":"Ok"}
 
 :code:`GET /token_load`
-~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~
 
 Get the current server token usage status
 
diff --git a/docs/EN/source/tutorial/api_server_args_zh.rst b/docs/EN/source/tutorial/api_server_args_zh.rst
@@ -1,10 +1,10 @@
 APIServer Parameter Details
-==========================
+===========================
 
 This document provides detailed information about all startup parameters and their usage for LightLLM APIServer.
 
 Basic Configuration Parameters
------------------------------
+------------------------------
 
 .. option:: --run_mode
 
@@ -38,7 +38,7 @@ Basic Configuration Parameters
     Can only choose from ``['tcp://', 'ipc:///tmp/']``
 
 PD disaggregation Mode Parameters
-----------------------------
+----------------------------------
 
 .. option:: --pd_master_ip
 
@@ -65,7 +65,7 @@ PD disaggregation Mode Parameters
     Port number in configuration server mode
 
 Model Configuration Parameters
------------------------------
+------------------------------
 
 .. option:: --model_name
 
@@ -96,7 +96,7 @@ Model Configuration Parameters
     Whether to allow using custom model definition files on Hub
 
 Memory and Batch Processing Parameters
-------------------------------------
+---------------------------------------
 
 .. option:: --max_total_token_num
 
@@ -135,7 +135,7 @@ Memory and Batch Processing Parameters
     * ``mistral``
 
 Different Parallel Mode Setting Parameters
-----------------------------------------
+-------------------------------------------
 
 .. option:: --nnodes
 
@@ -182,7 +182,7 @@ Different Parallel Mode Setting Parameters
     do not use the same nccl_port for different inference nodes, this will be a serious error
 
 Attention Type Selection Parameters
----------------------------------
+------------------------------------
 
 .. option:: --mode
 
@@ -199,7 +199,7 @@ Attention Type Selection Parameters
     Need to read source code to confirm specific modes supported by all models 
 
 Scheduling Parameters
---------------------
+---------------------
 
 .. option:: --router_token_ratio
 
@@ -241,7 +241,7 @@ Scheduling Parameters
     Schedule time interval, default is ``0.03``, unit is seconds
 
 Output Constraint Parameters
----------------------------
+----------------------------
 
 .. option:: --token_healing_mode
 
@@ -259,7 +259,7 @@ Output Constraint Parameters
     Use environment variable FIRST_ALLOWED_TOKENS to set the range, e.g., FIRST_ALLOWED_TOKENS=1,2
 
 Multimodal Parameters
---------------------
+---------------------
 
 .. option:: --enable_multimodal
 
@@ -298,7 +298,7 @@ Multimodal Parameters
     List of NCCL ports for ViT, e.g., 29500 29501 29502, default is [29500]
 
 Performance Optimization Parameters
-----------------------------------
+-----------------------------------
 
 .. option:: --disable_custom_allreduce
 
@@ -363,7 +363,7 @@ Performance Optimization Parameters
     Maximum sequence length that can be captured by cuda graph in the decoding phase, default is ``max_req_total_len``
 
 Quantization Parameters
-----------------------
+-----------------------
 
 .. option:: --quant_type
 
@@ -408,7 +408,7 @@ Quantization Parameters
     Examples can be found in lightllm/common/quantization/configs.
 
 Sampling and Generation Parameters
---------------------------------
+----------------------------------
 
 .. option:: --sampling_backend
 
@@ -438,7 +438,7 @@ Sampling and Generation Parameters
     Use tgi input and output format
 
 MTP Multi-Prediction Parameters
-------------------------------
+-------------------------------
 
 .. option:: --mtp_mode
 
@@ -463,7 +463,7 @@ MTP Multi-Prediction Parameters
     Currently deepseekv3/r1 models only support 1 step
 
 DeepSeek Redundant Expert Parameters
------------------------------------
+------------------------------------
 
 .. option:: --ep_redundancy_expert_config_path
 
@@ -474,7 +474,7 @@ DeepSeek Redundant Expert Parameters
     Whether to update redundant experts for deepseekv3 models through online expert usage counters.
 
 Monitoring and Logging Parameters
---------------------------------
+---------------------------------
 
 .. option:: --disable_log_stats
 
diff --git a/docs/EN/source/tutorial/deepseek_deployment.rst b/docs/EN/source/tutorial/deepseek_deployment.rst
@@ -6,7 +6,7 @@ DeepSeek Model Deployment Guide
 LightLLM supports various deployment solutions for DeepSeek models, including DeepSeek-R1, DeepSeek-V2, DeepSeek-V3, etc. This document provides detailed information on various deployment modes and configuration solutions.
 
 Deployment Mode Overview
------------------------
+------------------------
 
 LightLLM supports the following deployment modes:
 
@@ -157,7 +157,7 @@ Suitable for deploying MoE models across multiple nodes.
 - `--enable_decode_microbatch_overlap`: Enable decode microbatch overlap
 
 3. PD disaggregation Deployment Solutions
-------------------------------------
+------------------------------------------
 
 PD (Prefill-Decode) disaggregation mode separates prefill and decode stages for deployment, which can better utilize hardware resources.
 
@@ -328,7 +328,7 @@ Supports multiple PD Master nodes, providing better load balancing and high avai
               }'
 
 4.2 Performance Benchmark Testing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: bash
 
diff --git a/docs/EN/source/tutorial/multimodal.rst b/docs/EN/source/tutorial/multimodal.rst
@@ -1,10 +1,10 @@
 Multimodal Model Launch Configuration
-====================================
+=====================================
 
 LightLLM supports inference for various multimodal models. Below, using InternVL as an example, we explain the launch commands for multimodal services.
 
 Basic Launch Command
--------------------
+--------------------
 
 .. code-block:: bash
 
@@ -19,16 +19,16 @@ Basic Launch Command
     --enable_multimodal
 
 Core Parameter Description
--------------------------
+--------------------------
 
 Environment Variables
-^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^
 
 - **INTERNVL_IMAGE_LENGTH**: Set the image token length for InternVL model, default is 256
 - **LOADWORKER**: Set the number of worker processes for model loading
 
 Basic Service Parameters
-^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 - **--port 8080**: API server listening port
 - **--tp 2**: Tensor parallelism degree
@@ -38,7 +38,7 @@ Basic Service Parameters
 - **--enable_multimodal**: Enable multimodal functionality
 
 Advanced Configuration Parameters
---------------------------------
+---------------------------------
 
 .. code-block:: bash
 
@@ -58,20 +58,20 @@ ViT Deployment Methods
 ----------------------
 
 ViT TP (Tensor Parallel)
-^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^
 
 - Default usage
 - --visual_tp tp_size enables tensor parallelism
 
 ViT DP (Data Parallel)
-^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^
 
 - Distribute different image batches to multiple GPUs
 - Each GPU runs a complete ViT model copy
 - --visual_dp dp_size enables data parallelism
 
 Image Caching Mechanism
-----------------------
+-----------------------
 LightLLM caches embeddings of input images. In multi-turn conversations, if the images are the same, cached embeddings can be used directly, avoiding repeated inference.
 
 - **--cache_capacity**: Controls the number of cached image embeds
diff --git a/docs/EN/source/tutorial/openai.rst b/docs/EN/source/tutorial/openai.rst
@@ -6,7 +6,7 @@ LightLLM OpenAI API Usage Examples
 LightLLM provides an interface that is fully compatible with OpenAI API, supporting all standard OpenAI features including function calling. This document provides detailed information on how to use LightLLM's OpenAI interface.
 
 Basic Configuration
-------------------
+-------------------
 
 First, ensure that the LightLLM service is started:
 
@@ -19,7 +19,7 @@ First, ensure that the LightLLM service is started:
         --tp 1
 
 Basic Conversation Examples
---------------------------
+---------------------------
 
 1. Simple Conversation
 ~~~~~~~~~~~~~~~~~~~~~~
@@ -94,7 +94,7 @@ Basic Conversation Examples
         print("Error:", response.status_code, response.text)
 
 Function Calling Examples
-------------------------
+-------------------------
 
 LightLLM supports OpenAI's function calling functionality, providing function call parsing for three models. Specify the --tool_call_parser parameter when starting the service to choose. The service launch command is:
 
diff --git a/docs/EN/source/tutorial/reward_model.rst b/docs/EN/source/tutorial/reward_model.rst
@@ -1,5 +1,5 @@
 Reward Model Deployment Configuration
-====================================
+=====================================
 
 LightLLM supports inference for various reward models, used for evaluating conversation quality and generating reward scores. Currently supported reward models include InternLM2 Reward and Qwen2 Reward, etc.
 
@@ -18,7 +18,7 @@ Testing Examples
 ----------------
 
 Python Testing Code
-^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^
 
 .. code-block:: python
 
@@ -48,7 +48,7 @@ Python Testing Code
         print(f"Error: {response.status_code}, {response.text}")
 
 cURL Testing Command
-^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^
 
 .. code-block:: bash