You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -624,6 +624,49 @@ for question, response in zip(questions, responses):
624
624
625
625
</details>
626
626
627
+
## Inference Acceleration by LMDeploy
628
+
629
+
We recommend using [LMDeploy](https://github.com/InternLM/lmdeploy), if InternVL-Chat model inference optimization is required.
630
+
631
+
In the following subsections, we will introduce the usage of LMDeploy with the [InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) model as an example.
632
+
633
+
First of all, please setup the inference environment as follows:
634
+
635
+
```shell
636
+
conda create -n internvl python=3.10 -y
637
+
conda activate internvl
638
+
639
+
pip install timm torchvision==0.17.2
640
+
pip install lmdeploy
641
+
```
642
+
643
+
LMDeploy pypi package depends on CUDA 12.x by default. For a CUDA 11.x environment, please refer to the [installation guide](https://lmdeploy.readthedocs.io/en/latest/get_started.html#installation).
644
+
645
+
### Offline Inference Pipeline
646
+
647
+
```python
648
+
from lmdeploy import pipeline
649
+
from lmdeploy.vl import load_image
650
+
pipe = pipeline('OpenGVLab/InternVL-Chat-V1-5')
651
+
image = load_image('examples/image2.jpg')
652
+
response = pipe(('describe this image', image))
653
+
print(response)
654
+
```
655
+
For more on using the VLM pipeline, including multi-image inference or multi-turn chat, please overview [this](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html) guide.
656
+
657
+
### Online Inference Service
658
+
659
+
LMDeploy supports one-click packaging of the VLM model into an OpenAI service, providing seamless integration with the OpenAI API.
660
+
661
+
The service can be launched by one command as below:
The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.
667
+
668
+
For more details, including service startup with docker, RESTful API information, and openai integration methods, please refer to [this](https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html) guide.
669
+
627
670
## License
628
671
629
672
This project is released under the [MIT license](LICENSE). Parts of this project contain code and models from other sources, which are subject to their respective licenses.
0 commit comments