Skip to content

Commit edb3055

Browse files
authored
Support download models from www.modelscope.cn (#1588)
1 parent bb00f66 commit edb3055

File tree

4 files changed

+58
-4
lines changed

4 files changed

+58
-4
lines changed

docs/source/getting_started/quickstart.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,16 @@ Initialize vLLM's engine for offline inference with the ``LLM`` class and the `O
4040
4141
llm = LLM(model="facebook/opt-125m")
4242
43+
Use model from www.modelscope.cn
44+
45+
.. code-block:: shell
46+
47+
export VLLM_USE_MODELSCOPE=True
48+
49+
.. code-block:: python
50+
51+
llm = LLM(model="qwen/Qwen-7B-Chat", revision="v1.1.8", trust_remote_code=True)
52+
4353
Call ``llm.generate`` to generate the outputs. It adds the input prompts to vLLM engine's waiting queue and executes the vLLM engine to generate the outputs with high throughput. The outputs are returned as a list of ``RequestOutput`` objects, which include all the output tokens.
4454

4555
.. code-block:: python
@@ -67,6 +77,16 @@ Start the server:
6777
6878
$ python -m vllm.entrypoints.api_server
6979
80+
Use model from www.modelscope.cn
81+
82+
.. code-block:: console
83+
84+
$ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.api_server \
85+
$ --model="qwen/Qwen-7B-Chat" \
86+
$ --revision="v1.1.8" \
87+
$ --trust-remote-code
88+
89+
7090
By default, this command starts the server at ``http://localhost:8000`` with the OPT-125M model.
7191

7292
Query the model in shell:
@@ -95,6 +115,13 @@ Start the server:
95115
$ python -m vllm.entrypoints.openai.api_server \
96116
$ --model facebook/opt-125m
97117
118+
Use model from www.modelscope.cn
119+
120+
.. code-block:: console
121+
122+
$ VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.openai.api_server \
123+
$ --model="qwen/Qwen-7B-Chat" --revision="v1.1.8" --trust-remote-code
124+
98125
By default, it starts the server at ``http://localhost:8000``. You can specify the address with ``--host`` and ``--port`` arguments. The server currently hosts one model at a time (OPT-125M in the above command) and implements `list models <https://platform.openai.com/docs/api-reference/models/list>`_ and `create completion <https://platform.openai.com/docs/api-reference/completions/create>`_ endpoints. We are actively adding support for more endpoints.
99126

100127
This server can be queried in the same format as OpenAI API. For example, list the models:

docs/source/models/supported_models.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,18 @@ Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-pr
8181
output = llm.generate("Hello, my name is")
8282
print(output)
8383
84+
To use model from www.modelscope.cn
85+
86+
.. code-block:: shell
87+
88+
$ export VLLM_USE_MODELSCOPE=True
89+
90+
.. code-block:: python
91+
92+
from vllm import LLM
93+
94+
llm = LLM(model=..., revision=..., trust_remote_code=True) # Name or path of your model
95+
output = llm.generate("Hello, my name is")
96+
print(output)
97+
8498
If vLLM successfully generates text, it indicates that your model is supported.

vllm/config.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from typing import Optional, Union
2+
import os
23

34
import torch
45
from transformers import PretrainedConfig
@@ -76,7 +77,18 @@ def __init__(
7677
self.tokenizer_revision = tokenizer_revision
7778
self.quantization = quantization
7879

79-
self.hf_config = get_config(model, trust_remote_code, revision)
80+
if os.environ.get("VLLM_USE_MODELSCOPE", "False").lower() == "true":
81+
# download model from ModelScope hub,
82+
# lazy import so that modelscope is not required for normal use.
83+
from modelscope.hub.snapshot_download import snapshot_download # pylint: disable=C
84+
model_path = snapshot_download(model_id=model,
85+
cache_dir=download_dir,
86+
revision=revision)
87+
self.model = model_path
88+
self.download_dir = model_path
89+
self.tokenizer = model_path
90+
91+
self.hf_config = get_config(self.model, trust_remote_code, revision)
8092
self.dtype = _get_and_verify_dtype(self.hf_config, dtype)
8193
self.max_model_len = _get_and_verify_max_len(self.hf_config,
8294
max_model_len)

vllm/entrypoints/openai/api_server.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -648,9 +648,10 @@ async def fake_stream_generator() -> AsyncGenerator[str, None]:
648648
max_model_len = engine_model_config.max_model_len
649649

650650
# A separate tokenizer to map token IDs to strings.
651-
tokenizer = get_tokenizer(engine_args.tokenizer,
652-
tokenizer_mode=engine_args.tokenizer_mode,
653-
trust_remote_code=engine_args.trust_remote_code)
651+
tokenizer = get_tokenizer(
652+
engine_model_config.tokenizer,
653+
tokenizer_mode=engine_model_config.tokenizer_mode,
654+
trust_remote_code=engine_model_config.trust_remote_code)
654655

655656
uvicorn.run(app,
656657
host=args.host,

0 commit comments

Comments
 (0)