Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit 26e6259

Browse files
authored
Merge pull request #13 from dtrifiro/sync-with-upstream
sync with IBM/main
2 parents 6baa1b8 + b180134 commit 26e6259

File tree

320 files changed

+19299
-5463
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

320 files changed

+19299
-5463
lines changed

.buildkite/check-wheel-size.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
import os
2+
import zipfile
3+
4+
MAX_SIZE_MB = 100
5+
6+
7+
def print_top_10_largest_files(zip_file):
8+
with zipfile.ZipFile(zip_file, 'r') as z:
9+
file_sizes = [(f, z.getinfo(f).file_size) for f in z.namelist()]
10+
file_sizes.sort(key=lambda x: x[1], reverse=True)
11+
for f, size in file_sizes[:10]:
12+
print(f"{f}: {size/(1024*1024)} MBs uncompressed.")
13+
14+
15+
def check_wheel_size(directory):
16+
for root, _, files in os.walk(directory):
17+
for f in files:
18+
if f.endswith(".whl"):
19+
wheel_path = os.path.join(root, f)
20+
wheel_size = os.path.getsize(wheel_path)
21+
wheel_size_mb = wheel_size / (1024 * 1024)
22+
if wheel_size_mb > MAX_SIZE_MB:
23+
print(
24+
f"Wheel {wheel_path} is too large ({wheel_size_mb} MB) "
25+
f"compare to the allowed size ({MAX_SIZE_MB} MB).")
26+
print_top_10_largest_files(wheel_path)
27+
return 1
28+
else:
29+
print(f"Wheel {wheel_path} is within the allowed size "
30+
f"({wheel_size_mb} MB).")
31+
return 0
32+
33+
34+
if __name__ == "__main__":
35+
import sys
36+
sys.exit(check_wheel_size(sys.argv[1]))

.buildkite/run-amd-test.sh

Lines changed: 35 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,44 @@
1-
# This script build the ROCm docker image and run the API server inside the container.
2-
# It serves a sanity check for compilation and basic model usage.
1+
# This script build the ROCm docker image and runs test inside it.
32
set -ex
43

54
# Print ROCm version
5+
echo "--- ROCm info"
66
rocminfo
77

8-
# Try building the docker image
9-
docker build -t rocm -f Dockerfile.rocm .
8+
echo "--- Resetting GPUs"
109

11-
# Setup cleanup
12-
remove_docker_container() { docker rm -f rocm || true; }
13-
trap remove_docker_container EXIT
14-
remove_docker_container
15-
16-
# Run the image
17-
docker run --device /dev/kfd --device /dev/dri --network host --name rocm rocm python3 -m vllm.entrypoints.api_server &
18-
19-
# Wait for the server to start
20-
wait_for_server_to_start() {
21-
timeout=300
22-
counter=0
23-
24-
while [ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:8000/health)" != "200" ]; do
25-
sleep 1
26-
counter=$((counter + 1))
27-
if [ $counter -ge $timeout ]; then
28-
echo "Timeout after $timeout seconds"
29-
break
10+
echo "reset" > /opt/amdgpu/etc/gpu_state
11+
12+
while true; do
13+
sleep 3
14+
if grep -q clean /opt/amdgpu/etc/gpu_state; then
15+
echo "GPUs state is \"clean\""
16+
break
3017
fi
31-
done
18+
done
19+
20+
echo "--- Building container"
21+
sha=$(git rev-parse --short HEAD)
22+
container_name=rocm_${sha}
23+
docker build \
24+
-t ${container_name} \
25+
-f Dockerfile.rocm \
26+
--progress plain \
27+
.
28+
29+
remove_docker_container() {
30+
docker rm -f ${container_name} || docker image rm -f ${container_name} || true
3231
}
33-
wait_for_server_to_start
32+
trap remove_docker_container EXIT
33+
34+
echo "--- Running container"
35+
36+
docker run \
37+
--device /dev/kfd --device /dev/dri \
38+
--network host \
39+
--rm \
40+
-e HF_TOKEN \
41+
--name ${container_name} \
42+
${container_name} \
43+
/bin/bash -c $(echo $1 | sed "s/^'//" | sed "s/'$//")
3444

35-
# Test a simple prompt
36-
curl -X POST -H "Content-Type: application/json" \
37-
localhost:8000/generate \
38-
-d '{"prompt": "San Francisco is a"}'

.buildkite/run-benchmarks.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,11 @@ echo '```' >> benchmark_results.md
5353
tail -n 20 benchmark_serving.txt >> benchmark_results.md # last 20 lines
5454
echo '```' >> benchmark_results.md
5555

56+
# if the agent binary is not found, skip uploading the results, exit 0
57+
if [ ! -f /workspace/buildkite-agent ]; then
58+
exit 0
59+
fi
60+
5661
# upload the results to buildkite
5762
/workspace/buildkite-agent annotate --style "info" --context "benchmark-results" < benchmark_results.md
5863

.buildkite/run-neuron-test.sh

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# This script build the Neuron docker image and run the API server inside the container.
2+
# It serves a sanity check for compilation and basic model usage.
3+
set -e
4+
5+
# Try building the docker image
6+
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com
7+
8+
# prune old image and containers to save disk space, and only once a day
9+
# by using a timestamp file in tmp.
10+
if [ -f /tmp/neuron-docker-build-timestamp ]; then
11+
last_build=$(cat /tmp/neuron-docker-build-timestamp)
12+
current_time=$(date +%s)
13+
if [ $((current_time - last_build)) -gt 86400 ]; then
14+
docker system prune -f
15+
echo $current_time > /tmp/neuron-docker-build-timestamp
16+
fi
17+
else
18+
echo $(date +%s) > /tmp/neuron-docker-build-timestamp
19+
fi
20+
21+
docker build -t neuron -f Dockerfile.neuron .
22+
23+
# Setup cleanup
24+
remove_docker_container() { docker rm -f neuron || true; }
25+
trap remove_docker_container EXIT
26+
remove_docker_container
27+
28+
# Run the image
29+
docker run --device=/dev/neuron0 --device=/dev/neuron1 --network host --name neuron neuron python3 -m vllm.entrypoints.api_server \
30+
--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --max-num-seqs 8 --max-model-len 128 --block-size 128 --device neuron --tensor-parallel-size 2 &
31+
32+
# Wait for the server to start
33+
wait_for_server_to_start() {
34+
timeout=300
35+
counter=0
36+
37+
while [ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:8000/health)" != "200" ]; do
38+
sleep 1
39+
counter=$((counter + 1))
40+
if [ $counter -ge $timeout ]; then
41+
echo "Timeout after $timeout seconds"
42+
break
43+
fi
44+
done
45+
}
46+
wait_for_server_to_start
47+
48+
# Test a simple prompt
49+
curl -X POST -H "Content-Type: application/json" \
50+
localhost:8000/generate \
51+
-d '{"prompt": "San Francisco is a"}'

.buildkite/test-pipeline.yaml

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,32 +15,41 @@ steps:
1515
commands:
1616
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_basic_correctness.py
1717
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_basic_correctness.py
18-
- VLLM_ATTENTION_BACKEND=ROCM_FLASH pytest -v -s basic_correctness/test_basic_correctness.py
1918
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_chunked_prefill.py
2019
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py
21-
- VLLM_ATTENTION_BACKEND=ROCM_FLASH pytest -v -s basic_correctness/test_chunked_prefill.py
20+
- VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py
2221

2322
- label: Core Test
23+
mirror_hardwares: [amd]
2424
command: pytest -v -s core
2525

2626
- label: Distributed Comm Ops Test
2727
command: pytest -v -s test_comm_ops.py
2828
working_dir: "/vllm-workspace/tests/distributed"
29-
num_gpus: 2 # only support 1 or 2 for now.
29+
num_gpus: 2
3030

3131
- label: Distributed Tests
3232
working_dir: "/vllm-workspace/tests/distributed"
33+
3334
num_gpus: 2 # only support 1 or 2 for now.
35+
mirror_hardwares: [amd]
36+
3437
commands:
35-
- pytest -v -s test_pynccl.py
3638
- pytest -v -s test_pynccl_library.py
3739
- TEST_DIST_MODEL=facebook/opt-125m pytest -v -s test_basic_distributed_correctness.py
3840
- TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf pytest -v -s test_basic_distributed_correctness.py
3941
- TEST_DIST_MODEL=facebook/opt-125m pytest -v -s test_chunked_prefill_distributed.py
4042
- TEST_DIST_MODEL=meta-llama/Llama-2-7b-hf pytest -v -s test_chunked_prefill_distributed.py
4143

44+
- label: Distributed Tests (Multiple Groups)
45+
working_dir: "/vllm-workspace/tests/distributed"
46+
num_gpus: 4
47+
commands:
48+
- pytest -v -s test_pynccl.py
49+
4250
- label: Engine Test
43-
command: pytest -v -s engine tokenization test_sequence.py test_config.py
51+
mirror_hardwares: [amd]
52+
command: pytest -v -s engine tokenization test_sequence.py test_config.py test_logger.py
4453

4554
- label: Entrypoints Test
4655
commands:
@@ -50,6 +59,7 @@ steps:
5059

5160
- label: Examples Test
5261
working_dir: "/vllm-workspace/examples"
62+
mirror_hardwares: [amd]
5363
commands:
5464
# install aws cli for llava_example.py
5565
- pip install awscli
@@ -63,29 +73,35 @@ steps:
6373
parallelism: 4
6474

6575
- label: Models Test
76+
mirror_hardwares: [amd]
6677
commands:
6778
- bash ../.buildkite/download-images.sh
6879
- pytest -v -s models --ignore=models/test_llava.py --ignore=models/test_mistral.py
6980

7081
- label: Llava Test
82+
mirror_hardwares: [amd]
7183
commands:
7284
- bash ../.buildkite/download-images.sh
7385
- pytest -v -s models/test_llava.py
7486

7587
- label: Prefix Caching Test
88+
mirror_hardwares: [amd]
7689
commands:
7790
- pytest -v -s prefix_caching
7891

7992
- label: Samplers Test
8093
command: pytest -v -s samplers
8194

8295
- label: LogitsProcessor Test
96+
mirror_hardwares: [amd]
8397
command: pytest -v -s test_logits_processor.py
8498

8599
- label: Worker Test
100+
mirror_hardwares: [amd]
86101
command: pytest -v -s worker
87102

88103
- label: Speculative decoding tests
104+
mirror_hardwares: [amd]
89105
command: pytest -v -s spec_decode
90106

91107
- label: LoRA Test %N
@@ -98,8 +114,12 @@ steps:
98114
- label: Metrics Test
99115
command: pytest -v -s metrics
100116

117+
- label: Quantization Test
118+
command: pytest -v -s quantization
119+
101120
- label: Benchmarks
102121
working_dir: "/vllm-workspace/.buildkite"
122+
mirror_hardwares: [amd]
103123
commands:
104124
- pip install aiohttp
105125
- bash run-benchmarks.sh

.buildkite/test-template.j2

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,29 @@ steps:
1616
limit: 5
1717
- wait
1818

19-
- label: "AMD Test"
19+
- group: "AMD Tests"
20+
depends_on: ~
21+
steps:
22+
{% for step in steps %}
23+
{% if step.mirror_hardwares and "amd" in step.mirror_hardwares %}
24+
- label: "AMD: {{ step.label }}"
25+
agents:
26+
queue: amd
27+
command: bash .buildkite/run-amd-test.sh "'cd {{ (step.working_dir or default_working_dir) | safe }} && {{ step.command or (step.commands | join(' && ')) | safe }}'"
28+
env:
29+
DOCKER_BUILDKIT: "1"
30+
{% endif %}
31+
{% endfor %}
32+
33+
- label: "Neuron Test"
34+
depends_on: ~
2035
agents:
21-
queue: amd
22-
command: bash .buildkite/run-amd-test.sh
36+
queue: neuron
37+
command: bash .buildkite/run-neuron-test.sh
38+
soft_fail: true
2339

24-
- label: "CPU Test"
40+
- label: "Intel Test"
41+
depends_on: ~
2542
command: bash .buildkite/run-cpu-test.sh
2643

2744
{% for step in steps %}
@@ -39,6 +56,9 @@ steps:
3956
plugins:
4057
- kubernetes:
4158
podSpec:
59+
{% if step.num_gpus %}
60+
priorityClassName: gpu-priority-cls-{{ step.num_gpus }}
61+
{% endif %}
4262
volumes:
4363
- name: dshm
4464
emptyDir:

.github/ISSUE_TEMPLATE/200-installation.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ body:
1818
# For security purposes, please feel free to check the contents of collect_env.py before running it.
1919
python collect_env.py
2020
```
21+
It is suggested to download and execute the latest script, as vllm might frequently update the diagnosis information needed for accurately and quickly responding to issues.
2122
value: |
2223
```text
2324
The output of `python collect_env.py`

.github/ISSUE_TEMPLATE/300-usage.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ body:
1818
# For security purposes, please feel free to check the contents of collect_env.py before running it.
1919
python collect_env.py
2020
```
21+
It is suggested to download and execute the latest script, as vllm might frequently update the diagnosis information needed for accurately and quickly responding to issues.
2122
value: |
2223
```text
2324
The output of `python collect_env.py`

.github/ISSUE_TEMPLATE/400-bug report.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ body:
1818
# For security purposes, please feel free to check the contents of collect_env.py before running it.
1919
python collect_env.py
2020
```
21+
It is suggested to download and execute the latest script, as vllm might frequently update the diagnosis information needed for accurately and quickly responding to issues.
2122
value: |
2223
```text
2324
The output of `python collect_env.py`
@@ -57,6 +58,8 @@ body:
5758
If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.
5859
5960
Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
61+
62+
If you experienced crashes or hangs, it would be helpful to run vllm with `export VLLM_TRACE_FUNCTION=1` . All the function calls in vllm will be recorded. Inspect these log files, and tell which function crashes or hangs.
6063
placeholder: |
6164
A clear and concise description of what the bug is.
6265

.github/ISSUE_TEMPLATE/700-performance discussion.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ body:
3939
# For security purposes, please feel free to check the contents of collect_env.py before running it.
4040
python collect_env.py
4141
```
42+
It is suggested to download and execute the latest script, as vllm might frequently update the diagnosis information needed for accurately and quickly responding to issues.
4243
value: |
4344
```text
4445
The output of `python collect_env.py`

0 commit comments

Comments
 (0)