Skip to content

Commit 1d0eba0

Browse files
authored
Optimize InternVL inference via LMDeploy (#152)
1 parent bdbfcce commit 1d0eba0

File tree

1 file changed

+43
-0
lines changed

1 file changed

+43
-0
lines changed

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,49 @@ for question, response in zip(questions, responses):
624624

625625
</details>
626626

627+
## Inference Acceleration by LMDeploy
628+
629+
We recommend using [LMDeploy](https://github.com/InternLM/lmdeploy), if InternVL-Chat model inference optimization is required.
630+
631+
In the following subsections, we will introduce the usage of LMDeploy with the [InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) model as an example.
632+
633+
First of all, please setup the inference environment as follows:
634+
635+
```shell
636+
conda create -n internvl python=3.10 -y
637+
conda activate internvl
638+
639+
pip install timm torchvision==0.17.2
640+
pip install lmdeploy
641+
```
642+
643+
LMDeploy pypi package depends on CUDA 12.x by default. For a CUDA 11.x environment, please refer to the [installation guide](https://lmdeploy.readthedocs.io/en/latest/get_started.html#installation).
644+
645+
### Offline Inference Pipeline
646+
647+
```python
648+
from lmdeploy import pipeline
649+
from lmdeploy.vl import load_image
650+
pipe = pipeline('OpenGVLab/InternVL-Chat-V1-5')
651+
image = load_image('examples/image2.jpg')
652+
response = pipe(('describe this image', image))
653+
print(response)
654+
```
655+
For more on using the VLM pipeline, including multi-image inference or multi-turn chat, please overview [this](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html) guide.
656+
657+
### Online Inference Service
658+
659+
LMDeploy supports one-click packaging of the VLM model into an OpenAI service, providing seamless integration with the OpenAI API.
660+
661+
The service can be launched by one command as below:
662+
```shell
663+
lmdeploy serve api_server OpenGVLab/InternVL-Chat-V1-5
664+
```
665+
666+
The arguments of `api_server` can be viewed through the command `lmdeploy serve api_server -h`, for instance, `--tp` to set tensor parallelism, `--session-len` to specify the max length of the context window, `--cache-max-entry-count` to adjust the GPU mem ratio for k/v cache etc.
667+
668+
For more details, including service startup with docker, RESTful API information, and openai integration methods, please refer to [this](https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html) guide.
669+
627670
## License
628671

629672
This project is released under the [MIT license](LICENSE). Parts of this project contain code and models from other sources, which are subject to their respective licenses.

0 commit comments

Comments
 (0)