You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/EN/source/getting_started/benchmark.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Benchmark Testing Guide
4
4
LightLLM provides multiple performance testing tools, including service performance testing and static inference performance testing. This document will detailedly introduce how to use these tools for performance evaluation.
5
5
6
6
Service Performance Testing (Service Benchmark)
7
-
----------------------------------------------
7
+
-----------------------------------------------
8
8
9
9
Service performance testing is mainly used to evaluate LightLLM's performance in real service scenarios, including key metrics such as throughput and latency.
10
10
@@ -55,7 +55,7 @@ QPS (Queries Per Second) testing is the core tool for evaluating service perform
Copy file name to clipboardExpand all lines: docs/EN/source/tutorial/deepseek_deployment.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ DeepSeek Model Deployment Guide
6
6
LightLLM supports various deployment solutions for DeepSeek models, including DeepSeek-R1, DeepSeek-V2, DeepSeek-V3, etc. This document provides detailed information on various deployment modes and configuration solutions.
7
7
8
8
Deployment Mode Overview
9
-
-----------------------
9
+
------------------------
10
10
11
11
LightLLM supports the following deployment modes:
12
12
@@ -157,7 +157,7 @@ Suitable for deploying MoE models across multiple nodes.
Copy file name to clipboardExpand all lines: docs/EN/source/tutorial/multimodal.rst
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
Multimodal Model Launch Configuration
2
-
====================================
2
+
=====================================
3
3
4
4
LightLLM supports inference for various multimodal models. Below, using InternVL as an example, we explain the launch commands for multimodal services.
5
5
6
6
Basic Launch Command
7
-
-------------------
7
+
--------------------
8
8
9
9
.. code-block:: bash
10
10
@@ -19,16 +19,16 @@ Basic Launch Command
19
19
--enable_multimodal
20
20
21
21
Core Parameter Description
22
-
-------------------------
22
+
--------------------------
23
23
24
24
Environment Variables
25
-
^^^^^^^^^^^^^^^^^^^^
25
+
^^^^^^^^^^^^^^^^^^^^^
26
26
27
27
- **INTERNVL_IMAGE_LENGTH**: Set the image token length for InternVL model, default is 256
28
28
- **LOADWORKER**: Set the number of worker processes for model loading
- Distribute different image batches to multiple GPUs
70
70
- Each GPU runs a complete ViT model copy
71
71
- --visual_dp dp_size enables data parallelism
72
72
73
73
Image Caching Mechanism
74
-
----------------------
74
+
-----------------------
75
75
LightLLM caches embeddings of input images. In multi-turn conversations, if the images are the same, cached embeddings can be used directly, avoiding repeated inference.
76
76
77
77
- **--cache_capacity**: Controls the number of cached image embeds
Copy file name to clipboardExpand all lines: docs/EN/source/tutorial/openai.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ LightLLM OpenAI API Usage Examples
6
6
LightLLM provides an interface that is fully compatible with OpenAI API, supporting all standard OpenAI features including function calling. This document provides detailed information on how to use LightLLM's OpenAI interface.
7
7
8
8
Basic Configuration
9
-
------------------
9
+
-------------------
10
10
11
11
First, ensure that the LightLLM service is started:
12
12
@@ -19,7 +19,7 @@ First, ensure that the LightLLM service is started:
LightLLM supports OpenAI's function calling functionality, providing function call parsing for three models. Specify the --tool_call_parser parameter when starting the service to choose. The service launch command is:
Copy file name to clipboardExpand all lines: docs/EN/source/tutorial/reward_model.rst
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
Reward Model Deployment Configuration
2
-
====================================
2
+
=====================================
3
3
4
4
LightLLM supports inference for various reward models, used for evaluating conversation quality and generating reward scores. Currently supported reward models include InternLM2 Reward and Qwen2 Reward, etc.
0 commit comments