Skip to content

Commit 7699226

Browse files
committed
data type update
1 parent 604de1a commit 7699226

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

content/learning-paths/servers-and-cloud-computing/vllm/vllm-run.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,21 +31,28 @@ To run inference with multiple prompts, you can create a simple Python script to
3131
Use a text editor to save the Python script below in a file called `batch.py`:
3232

3333
```python
34+
import os
3435
import json
3536
from vllm import LLM, SamplingParams
3637

38+
# Force CPU-only execution
39+
os.environ["CUDA_VISIBLE_DEVICES"] = ""
40+
3741
# Sample prompts.
3842
prompts = [
3943
"Write a hello world program in C",
4044
"Write a hello world program in Java",
4145
"Write a hello world program in Rust",
4246
]
4347

48+
# Modify model here
49+
MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
50+
4451
# Create a sampling params object.
4552
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
4653

4754
# Create an LLM.
48-
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct", dtype="bfloat16")
55+
llm = LLM(model=MODEL, dtype="float32", enforce_eager=True, tensor_parallel_size=1)
4956

5057
# Generate texts from the prompts. The output is a list of RequestOutput objects
5158
# that contain the prompt, generated text, and other information.

content/learning-paths/servers-and-cloud-computing/vllm/vllm-server.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ OpenAI compatibility means that you can reuse existing software which was design
1919
Run vLLM with the same `Qwen/Qwen2.5-0.5B-Instruct` model:
2020

2121
```bash
22-
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float16
22+
python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-0.5B-Instruct --dtype float32
2323
```
2424

2525
The server output displays that it is ready for requests:

0 commit comments

Comments
 (0)