Skip to content

Commit 939eae5

Browse files
authored
Merge pull request #164 from VectorInstitute/f/sglang-support
- Abstract inference engine - Add SGlang as alternative inference engine to vLLM - Update default model and environment config formats - Update model types to be dynamic
2 parents e8f0788 + d75bfe7 commit 939eae5

33 files changed

+5447
-2215
lines changed

.github/workflows/docker.yml

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,33 +7,56 @@ on:
77
branches:
88
- main
99
paths:
10-
- Dockerfile
10+
- vllm.Dockerfile
11+
- sglang.Dockerfile
1112
- .github/workflows/docker.yml
1213
- uv.lock
1314
pull_request:
1415
branches:
1516
- main
17+
- f/sglang-support
1618
paths:
17-
- Dockerfile
19+
- vllm.Dockerfile
20+
- sglang.Dockerfile
1821
- .github/workflows/docker.yml
1922
- uv.lock
2023

2124
jobs:
2225
push_to_registry:
23-
name: Push Docker image to Docker Hub
26+
name: Build and push Docker images
2427
runs-on:
25-
- self-hosted
26-
- docker
28+
- ubuntu-latest
29+
strategy:
30+
matrix:
31+
backend: [vllm, sglang]
2732
steps:
2833
- name: Checkout repository
2934
uses: actions/[email protected]
3035

31-
- name: Extract vLLM version
32-
id: vllm-version
36+
- name: Extract backend version
37+
id: backend-version
3338
run: |
34-
VERSION=$(grep -A 1 'name = "vllm"' uv.lock | grep version | cut -d '"' -f 2)
39+
VERSION=$(grep -A 1 "name = \"${{ matrix.backend }}\"" uv.lock | grep version | cut -d '"' -f 2)
3540
echo "version=$VERSION" >> $GITHUB_OUTPUT
3641
42+
- name: Maximize build space
43+
run: |
44+
echo "Disk space before cleanup:"
45+
df -h
46+
# Remove unnecessary pre-installed software
47+
sudo rm -rf /usr/share/dotnet
48+
sudo rm -rf /usr/local/lib/android
49+
sudo rm -rf /opt/ghc
50+
sudo rm -rf /opt/hostedtoolcache/CodeQL
51+
sudo rm -rf /usr/local/share/boost
52+
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
53+
# Clean apt cache
54+
sudo apt-get clean
55+
# Remove docker images
56+
docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
57+
echo "Disk space after cleanup:"
58+
df -h
59+
3760
- name: Set up Docker Buildx
3861
uses: docker/setup-buildx-action@v3
3962

@@ -47,15 +70,16 @@ jobs:
4770
id: meta
4871
uses: docker/metadata-action@318604b99e75e41977312d83839a89be02ca4893
4972
with:
50-
images: vectorinstitute/vector-inference
73+
images: vectorinstitute/vector-inference-${{ matrix.backend }}
5174

5275
- name: Build and push Docker image
5376
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83
5477
with:
5578
context: .
56-
file: ./Dockerfile
79+
file: ./${{ matrix.backend }}.Dockerfile
5780
push: true
5881
tags: |
5982
${{ steps.meta.outputs.tags }}
60-
vectorinstitute/vector-inference:${{ steps.vllm-version.outputs.version }}
83+
vectorinstitute/vector-inference-${{ matrix.backend }}:${{ steps.backend-version.outputs.version }}
84+
vectorinstitute/vector-inference-${{ matrix.backend }}:latest
6185
labels: ${{ steps.meta.outputs.labels }}

.github/workflows/docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
python-version-file: ".python-version"
6868

6969
- name: Install the project
70-
run: uv sync --all-extras --group docs --prerelease=allow
70+
run: uv sync --group docs --prerelease=allow
7171

7272
- name: Build docs
7373
run: uv run --frozen mkdocs build
@@ -104,7 +104,7 @@ jobs:
104104
python-version-file: ".python-version"
105105

106106
- name: Install the project
107-
run: uv sync --all-extras --group docs --frozen
107+
run: uv sync --group docs --frozen
108108

109109
- name: Configure Git Credentials
110110
run: |

.github/workflows/unit_tests.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,16 +58,26 @@ jobs:
5858
python-version: ${{ matrix.python-version }}
5959

6060
- name: Install the project
61+
env:
62+
# Ensure uv uses the matrix interpreter instead of `.python-version` (3.10),
63+
# otherwise the "3.11"/"3.12" jobs silently run on 3.10.
64+
UV_PYTHON: ${{ matrix.python-version }}
6165
run: uv sync --dev --prerelease=allow
6266

6367
- name: Install dependencies and check code
68+
env:
69+
UV_PYTHON: ${{ matrix.python-version }}
6470
run: |
6571
uv run --frozen pytest -m "not integration_test" --cov vec_inf --cov-report=xml tests
6672
6773
- name: Install the core package only
74+
env:
75+
UV_PYTHON: ${{ matrix.python-version }}
6876
run: uv sync --no-dev
6977

7078
- name: Run package import tests
79+
env:
80+
UV_PYTHON: ${{ matrix.python-version }}
7181
run: |
7282
uv run --frozen pytest tests/test_imports.py
7383

MODEL_TRACKING.md

Lines changed: 80 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ This document tracks all model weights available in the `/model-weights` directo
9494
| Model | Configuration |
9595
|:------|:-------------|
9696
| `Llama-4-Scout-17B-16E-Instruct` ||
97+
| `Llama-4-Maverick-17B-128E-Instruct` ||
9798

9899
### Mistral AI: Mistral
99100
| Model | Configuration |
@@ -128,6 +129,7 @@ This document tracks all model weights available in the `/model-weights` directo
128129
|:------|:-------------|
129130
| `Qwen2.5-0.5B-Instruct` ||
130131
| `Qwen2.5-1.5B-Instruct` ||
132+
| `Qwen2.5-3B` ||
131133
| `Qwen2.5-3B-Instruct` ||
132134
| `Qwen2.5-7B-Instruct` ||
133135
| `Qwen2.5-14B-Instruct` ||
@@ -138,12 +140,14 @@ This document tracks all model weights available in the `/model-weights` directo
138140
| Model | Configuration |
139141
|:------|:-------------|
140142
| `Qwen2.5-Math-1.5B-Instruct` ||
143+
| `Qwen2.5-Math-7B` ||
141144
| `Qwen2.5-Math-7B-Instruct` ||
142145
| `Qwen2.5-Math-72B-Instruct` ||
143146

144147
### Qwen: Qwen2.5-Coder
145148
| Model | Configuration |
146149
|:------|:-------------|
150+
| `Qwen2.5-Coder-3B-Instruct` ||
147151
| `Qwen2.5-Coder-7B-Instruct` ||
148152

149153
### Qwen: QwQ
@@ -162,6 +166,12 @@ This document tracks all model weights available in the `/model-weights` directo
162166
| `Qwen2-Math-72B-Instruct` ||
163167
| `Qwen2-VL-7B-Instruct` ||
164168

169+
### Qwen: Qwen2.5-VL
170+
| Model | Configuration |
171+
|:------|:-------------|
172+
| `Qwen2.5-VL-3B-Instruct` ||
173+
| `Qwen2.5-VL-7B-Instruct` ||
174+
165175
### Qwen: Qwen3
166176
| Model | Configuration |
167177
|:------|:-------------|
@@ -191,27 +201,76 @@ This document tracks all model weights available in the `/model-weights` directo
191201
| Model | Configuration |
192202
|:------|:-------------|
193203
| `gpt-oss-120b` ||
204+
| `gpt-oss-20b` ||
194205

195-
### Other LLM Models
206+
207+
#### AI21: Jamba
196208
| Model | Configuration |
197209
|:------|:-------------|
198210
| `AI21-Jamba-1.5-Mini` ||
199-
| `aya-expanse-32b` | ✅ (as Aya-Expanse-32B) |
211+
212+
#### Cohere for AI: Aya
213+
| Model | Configuration |
214+
|:------|:-------------|
215+
| `aya-expanse-32b` ||
216+
217+
#### OpenAI: GPT-2
218+
| Model | Configuration |
219+
|:------|:-------------|
200220
| `gpt2-large` ||
201221
| `gpt2-xl` ||
202-
| `gpt-oss-120b` ||
203-
| `instructblip-vicuna-7b` ||
222+
223+
#### InternLM: InternLM2
224+
| Model | Configuration |
225+
|:------|:-------------|
204226
| `internlm2-math-plus-7b` ||
227+
228+
#### Janus
229+
| Model | Configuration |
230+
|:------|:-------------|
205231
| `Janus-Pro-7B` ||
232+
233+
#### Moonshot AI: Kimi
234+
| Model | Configuration |
235+
|:------|:-------------|
206236
| `Kimi-K2-Instruct` ||
237+
238+
#### Mistral AI: Ministral
239+
| Model | Configuration |
240+
|:------|:-------------|
207241
| `Ministral-8B-Instruct-2410` ||
208-
| `Molmo-7B-D-0924` ||
242+
243+
#### AI2: OLMo
244+
| Model | Configuration |
245+
|:------|:-------------|
209246
| `OLMo-1B-hf` ||
210247
| `OLMo-7B-hf` ||
211248
| `OLMo-7B-SFT` ||
249+
250+
#### EleutherAI: Pythia
251+
| Model | Configuration |
252+
|:------|:-------------|
212253
| `pythia` ||
254+
255+
#### Qwen: Qwen1.5
256+
| Model | Configuration |
257+
|:------|:-------------|
213258
| `Qwen1.5-72B-Chat` ||
259+
260+
#### ReasonFlux
261+
| Model | Configuration |
262+
|:------|:-------------|
214263
| `ReasonFlux-PRM-7B` ||
264+
265+
#### LMSYS: Vicuna
266+
| Model | Configuration |
267+
|:------|:-------------|
268+
| `vicuna-13b-v1.5` ||
269+
270+
#### Google: T5 (Encoder-Decoder Models)
271+
**Note**: These are encoder-decoder (T5) models, not decoder-only LLMs.
272+
| Model | Configuration |
273+
|:------|:-------------|
215274
| `t5-large-lm-adapt` ||
216275
| `t5-xl-lm-adapt` ||
217276
| `mt5-xl-lm-adapt` ||
@@ -238,10 +297,10 @@ This document tracks all model weights available in the `/model-weights` directo
238297
### Meta: Llama 3.2 Vision
239298
| Model | Configuration |
240299
|:------|:-------------|
241-
| `Llama-3.2-11B-Vision` | |
242-
| `Llama-3.2-11B-Vision-Instruct` ||
243-
| `Llama-3.2-90B-Vision` | |
244-
| `Llama-3.2-90B-Vision-Instruct` ||
300+
| `Llama-3.2-11B-Vision` | |
301+
| `Llama-3.2-11B-Vision-Instruct` | ✅ | (SGLang only)
302+
| `Llama-3.2-90B-Vision` | |
303+
| `Llama-3.2-90B-Vision-Instruct` | ✅ | (SGLang only)
245304

246305
### Mistral: Pixtral
247306
| Model | Configuration |
@@ -266,10 +325,19 @@ This document tracks all model weights available in the `/model-weights` directo
266325
| `deepseek-vl2` ||
267326
| `deepseek-vl2-small` ||
268327

328+
### Google: MedGemma
329+
| Model | Configuration |
330+
|:------|:-------------|
331+
| `medgemma-4b-it` ||
332+
| `medgemma-27b-it` ||
333+
| `medgemma-27b-text-it` ||
334+
269335
### Other VLM Models
270336
| Model | Configuration |
271337
|:------|:-------------|
338+
| `instructblip-vicuna-7b` ||
272339
| `MiniCPM-Llama3-V-2_5` ||
340+
| `Molmo-7B-D-0924` ||
273341

274342
---
275343

@@ -298,6 +366,8 @@ This document tracks all model weights available in the `/model-weights` directo
298366
| `data2vec` ||
299367
| `gte-modernbert-base` ||
300368
| `gte-Qwen2-7B-instruct` ||
369+
| `KaLM-Embedding-Gemma3-12B-2511` ||
370+
| `llama-embed-nemotron-8b` ||
301371
| `m2-bert-80M-32k-retrieval` ||
302372
| `m2-bert-80M-8k-retrieval` ||
303373

@@ -313,7 +383,7 @@ This document tracks all model weights available in the `/model-weights` directo
313383

314384
---
315385

316-
## Multimodal Models
386+
## Vision Models
317387

318388
### CLIP
319389
| Model | Configuration |

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vLLM-0.11.0-blue)](https://docs.vllm.ai/en/v0.11.0/)
10+
[![vLLM](https://img.shields.io/badge/vLLM-0.12.0-blue)](https://docs.vllm.ai/en/v0.12.0/)
11+
[![SGLang](https://img.shields.io/badge/SGLang-0.5.5.post3-blue)](https://docs.sglang.io/index.html)
1112
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1213

13-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
14+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
1415

1516
**NOTE**: Supported models on Killarney are tracked [here](./MODEL_TRACKING.md)
1617

@@ -20,12 +21,12 @@ If you are using the Vector cluster environment, and you don't need any customiz
2021
```bash
2122
pip install vec-inf
2223
```
23-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
24+
Otherwise, we recommend using the provided [`vllm.Dockerfile`](vllm.Dockerfile) and [`sglang.Dockerfile`](sglang.Dockerfile) to set up your own environment with the package. The built images are available through [Docker Hub](https://hub.docker.com/orgs/vectorinstitute/repositories)
2425

2526
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
2627
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
2728
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
28-
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
29+
* [OPTIONAL] The package could also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
2930

3031
## Usage
3132

@@ -42,13 +43,13 @@ vec-inf launch Meta-Llama-3.1-8B-Instruct
4243
```
4344
You should see an output like the following:
4445

45-
<img width="720" alt="launch_image" src="https://github.com/user-attachments/assets/c1e0c60c-cf7a-49ed-a426-fdb38ebf88ee" />
46+
<img width="720" alt="launch_image" src="./docs/assets/launch.png" />
4647

4748
**NOTE**: You can set the required fields in the environment configuration (`environment.yaml`), it's a mapping between required arguments and their corresponding environment variables. On the Vector **Killarney** Cluster environment, the required fields are:
4849
* `--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`.
4950
* `--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`.
5051

51-
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html). For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command)
52+
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command)
5253

5354
#### Other commands
5455

docs/assets/launch.png

47.4 KB
Loading

docs/index.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Vector Inference: Easy inference on Slurm clusters
22

3-
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/stable/). **This package runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
3+
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
4+
45

56
**NOTE**: Supported models on Killarney are tracked [here](https://github.com/VectorInstitute/vector-inference/blob/main/MODEL_TRACKING.md)
67

@@ -12,9 +13,9 @@ If you are using the Vector cluster environment, and you don't need any customiz
1213
pip install vec-inf
1314
```
1415

15-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
16+
Otherwise, we recommend using the provided [`vllm.Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/vllm.Dockerfile) and [`sglang.Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/sglang.Dockerfile) to set up your own environment with the package. The built images are available through [Docker Hub](https://hub.docker.com/orgs/vectorinstitute/repositories)
1617

1718
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
1819
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config), then install from source by running `pip install .`.
1920
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config).
20-
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
21+
* [OPTIONAL] The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.

0 commit comments

Comments
 (0)