Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
8e7e5a3
Add additional dependencies for sglang
XkunW Nov 11, 2025
1a564e2
Use --ntasks-per-node instead of --tasks-per-node
XkunW Nov 11, 2025
142694b
Update script generator test
XkunW Nov 11, 2025
f9e61a6
Add missing system package for sglang, update dev dependency versions…
XkunW Nov 11, 2025
bdd9483
Add step to clean up runner space for docker action
XkunW Nov 12, 2025
ed12e14
Merge branch 'main' into f/sglang-support
XkunW Nov 12, 2025
4e4e8e1
Remove maximize runner space step for docker action
XkunW Nov 12, 2025
0f46702
Fix disk space blowup by adding cache clean
amrit110 Nov 13, 2025
6d10210
Add trigger for docker workflow to f/sglang-support
amrit110 Nov 13, 2025
cd2ca5e
Fixes, it seems to be uv cache related
amrit110 Nov 13, 2025
de83788
Try again with some aggressive pre cleanup
amrit110 Nov 13, 2025
d967a90
Merge pull request #168 from VectorInstitute/fix_disk_space_docker_build
XkunW Nov 13, 2025
bfc2d6a
Add Llama 4 Maverick
XkunW Nov 26, 2025
f94dd84
Merge branch 'main' into f/sglang-support
XkunW Nov 26, 2025
ac9161c
Fix incorrect slurm job config mapping
XkunW Nov 26, 2025
c73e163
Add torchao
XkunW Nov 26, 2025
5cb259e
Merge branch 'f/sglang-support' of https://github.com/VectorInstitute…
XkunW Nov 26, 2025
d082732
Split vllm and sglang dependencies into 2 groups and 2 docker images
XkunW Nov 26, 2025
3fa106c
Update docs action
XkunW Nov 26, 2025
aa11ddf
Rename vllm_args to engine_args, add engine as additional argument
XkunW Nov 26, 2025
be91fe3
Update vllm_args to engine args, add SGLang argument mapping,
XkunW Nov 26, 2025
9539acf
Add engine to batch mode match arg
XkunW Nov 27, 2025
94feb5e
Update IMAGE_PATH to a dictionary of engine specific image paths
XkunW Nov 27, 2025
de1e65b
Update slurm script generations
XkunW Nov 27, 2025
bb2f9bb
Update doc string
XkunW Nov 27, 2025
990b6a5
Update engine arg rendering
XkunW Dec 2, 2025
bc9904d
ruff format
XkunW Dec 2, 2025
73ebc32
mypy fix
XkunW Dec 2, 2025
c45af32
Add more mapping
XkunW Dec 2, 2025
0200339
Update typing for REQUIRED_ARGS, add PYTHON_VERSION for sglang
XkunW Dec 2, 2025
5bb3267
Update slurm script generation for sglang
XkunW Dec 2, 2025
74bf7b6
Update CLI engine arg overwrite, formatting
XkunW Dec 2, 2025
fb0e7a7
Format
XkunW Dec 2, 2025
19181f1
Rename vllm_args to engine_args
XkunW Dec 2, 2025
145f43b
Merge branch 'f/sglang-support' of https://github.com/VectorInstitute…
XkunW Dec 2, 2025
2bcbbf4
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 2, 2025
d758c19
Update dependencies
XkunW Dec 3, 2025
481dcac
Add default resource type, add vllm_image_path and sglang_image_path
XkunW Dec 4, 2025
f8aec1b
Remove default fields, add sglang args
XkunW Dec 4, 2025
bd0041f
Merge branch 'f/sglang-support' of https://github.com/VectorInstitute…
XkunW Dec 4, 2025
9a93ebd
Add cachde model config path to environment yaml and use that instead…
XkunW Dec 4, 2025
4f29ef6
Merge branch 'main' into f/sglang-support
XkunW Dec 4, 2025
d22f777
Replace engine_args with vllm_args and sglang_args in CLI
XkunW Dec 8, 2025
5edb277
Change engine_args to engine specific args (vllm_args,sglang_args) in…
XkunW Dec 9, 2025
e0f3456
Update ENGINE_ARGS_PLACEHOLDER to SGLANG_ARGS_PLACEHOLDER
XkunW Dec 9, 2025
0b66b78
Update CACHED_MODEL_CONFIG_PATH from str to Path
XkunW Dec 9, 2025
687aeda
Update engine choice and engine arg overwrite logic to be engine agno…
XkunW Dec 9, 2025
9fdf645
Update engine abstraction
XkunW Dec 9, 2025
59427bd
Add err log decoding error handling
XkunW Dec 9, 2025
8461ddd
Fix list command enging arg rendering, add a temp fix for a known bug…
XkunW Dec 9, 2025
7781c9e
Allow batch launch to use different engines
XkunW Dec 9, 2025
439dac4
Update dependencies and bump version
XkunW Dec 9, 2025
2baf7ed
Strip the ending backslash instead of replace in model launch script …
XkunW Dec 9, 2025
07bf6c7
Ruff and mypy fixes
XkunW Dec 9, 2025
10fcefb
Fix typo
XkunW Dec 9, 2025
3059f54
Update tests
XkunW Dec 10, 2025
205e8b2
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 10, 2025
ff87bbf
Set extra arguments to 'allow' for ModelConfig
XkunW Dec 11, 2025
f351f1c
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2025
a68f15e
Change engine inferred warning to a CLI message
XkunW Dec 15, 2025
118a0a2
Fix tests
XkunW Dec 15, 2025
17d1539
Update unit test workflow to use python version matrix
XkunW Dec 15, 2025
c2a0323
uv lock
XkunW Dec 15, 2025
ac3e550
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 15, 2025
cbe250f
ruff fix
XkunW Dec 15, 2025
dd581d9
Add env var description for --config
XkunW Dec 20, 2025
900bb61
Add new models
XkunW Dec 24, 2025
cb49592
Update model tracking
XkunW Dec 24, 2025
9bcd9e4
Update cached model config
XkunW Jan 7, 2026
5cb77ff
Updte documentation
XkunW Jan 7, 2026
12d3c55
Update model tracking
XkunW Jan 8, 2026
f8f9fd6
Move model types to environment config to be dynamic
XkunW Jan 8, 2026
d0e773d
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 8, 2026
d75bfe7
Ignore mypy error
XkunW Jan 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 35 additions & 11 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,56 @@ on:
branches:
- main
paths:
- Dockerfile
- vllm.Dockerfile
- sglang.Dockerfile
- .github/workflows/docker.yml
- uv.lock
pull_request:
branches:
- main
- f/sglang-support
paths:
- Dockerfile
- vllm.Dockerfile
- sglang.Dockerfile
- .github/workflows/docker.yml
- uv.lock

jobs:
push_to_registry:
name: Push Docker image to Docker Hub
name: Build and push Docker images
runs-on:
- self-hosted
- docker
- ubuntu-latest
strategy:
matrix:
backend: [vllm, sglang]
steps:
- name: Checkout repository
uses: actions/[email protected]

- name: Extract vLLM version
id: vllm-version
- name: Extract backend version
id: backend-version
run: |
VERSION=$(grep -A 1 'name = "vllm"' uv.lock | grep version | cut -d '"' -f 2)
VERSION=$(grep -A 1 "name = \"${{ matrix.backend }}\"" uv.lock | grep version | cut -d '"' -f 2)
echo "version=$VERSION" >> $GITHUB_OUTPUT

- name: Maximize build space
run: |
echo "Disk space before cleanup:"
df -h
# Remove unnecessary pre-installed software
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /usr/local/share/boost
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
# Clean apt cache
sudo apt-get clean
# Remove docker images
docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
echo "Disk space after cleanup:"
df -h

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

Expand All @@ -47,15 +70,16 @@ jobs:
id: meta
uses: docker/metadata-action@318604b99e75e41977312d83839a89be02ca4893
with:
images: vectorinstitute/vector-inference
images: vectorinstitute/vector-inference-${{ matrix.backend }}

- name: Build and push Docker image
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83
with:
context: .
file: ./Dockerfile
file: ./${{ matrix.backend }}.Dockerfile
push: true
tags: |
${{ steps.meta.outputs.tags }}
vectorinstitute/vector-inference:${{ steps.vllm-version.outputs.version }}
vectorinstitute/vector-inference-${{ matrix.backend }}:${{ steps.backend-version.outputs.version }}
vectorinstitute/vector-inference-${{ matrix.backend }}:latest
labels: ${{ steps.meta.outputs.labels }}
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
python-version-file: ".python-version"

- name: Install the project
run: uv sync --all-extras --group docs --prerelease=allow
run: uv sync --group docs --prerelease=allow

- name: Build docs
run: uv run --frozen mkdocs build
Expand Down Expand Up @@ -104,7 +104,7 @@ jobs:
python-version-file: ".python-version"

- name: Install the project
run: uv sync --all-extras --group docs --frozen
run: uv sync --group docs --frozen

- name: Configure Git Credentials
run: |
Expand Down
10 changes: 10 additions & 0 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,26 @@ jobs:
python-version: ${{ matrix.python-version }}

- name: Install the project
env:
# Ensure uv uses the matrix interpreter instead of `.python-version` (3.10),
# otherwise the "3.11"/"3.12" jobs silently run on 3.10.
UV_PYTHON: ${{ matrix.python-version }}
run: uv sync --dev --prerelease=allow

- name: Install dependencies and check code
env:
UV_PYTHON: ${{ matrix.python-version }}
run: |
uv run --frozen pytest -m "not integration_test" --cov vec_inf --cov-report=xml tests

- name: Install the core package only
env:
UV_PYTHON: ${{ matrix.python-version }}
run: uv sync --no-dev

- name: Run package import tests
env:
UV_PYTHON: ${{ matrix.python-version }}
run: |
uv run --frozen pytest tests/test_imports.py

Expand Down
90 changes: 80 additions & 10 deletions MODEL_TRACKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ This document tracks all model weights available in the `/model-weights` directo
| Model | Configuration |
|:------|:-------------|
| `Llama-4-Scout-17B-16E-Instruct` | ❌ |
| `Llama-4-Maverick-17B-128E-Instruct` | ❌ |

### Mistral AI: Mistral
| Model | Configuration |
Expand Down Expand Up @@ -128,6 +129,7 @@ This document tracks all model weights available in the `/model-weights` directo
|:------|:-------------|
| `Qwen2.5-0.5B-Instruct` | ✅ |
| `Qwen2.5-1.5B-Instruct` | ✅ |
| `Qwen2.5-3B` | ❌ |
| `Qwen2.5-3B-Instruct` | ✅ |
| `Qwen2.5-7B-Instruct` | ✅ |
| `Qwen2.5-14B-Instruct` | ✅ |
Expand All @@ -138,12 +140,14 @@ This document tracks all model weights available in the `/model-weights` directo
| Model | Configuration |
|:------|:-------------|
| `Qwen2.5-Math-1.5B-Instruct` | ✅ |
| `Qwen2.5-Math-7B` | ❌ |
| `Qwen2.5-Math-7B-Instruct` | ✅ |
| `Qwen2.5-Math-72B-Instruct` | ✅ |

### Qwen: Qwen2.5-Coder
| Model | Configuration |
|:------|:-------------|
| `Qwen2.5-Coder-3B-Instruct` | ✅ |
| `Qwen2.5-Coder-7B-Instruct` | ✅ |

### Qwen: QwQ
Expand All @@ -162,6 +166,12 @@ This document tracks all model weights available in the `/model-weights` directo
| `Qwen2-Math-72B-Instruct` | ❌ |
| `Qwen2-VL-7B-Instruct` | ❌ |

### Qwen: Qwen2.5-VL
| Model | Configuration |
|:------|:-------------|
| `Qwen2.5-VL-3B-Instruct` | ❌ |
| `Qwen2.5-VL-7B-Instruct` | ✅ |

### Qwen: Qwen3
| Model | Configuration |
|:------|:-------------|
Expand Down Expand Up @@ -191,27 +201,76 @@ This document tracks all model weights available in the `/model-weights` directo
| Model | Configuration |
|:------|:-------------|
| `gpt-oss-120b` | ✅ |
| `gpt-oss-20b` | ✅ |

### Other LLM Models

#### AI21: Jamba
| Model | Configuration |
|:------|:-------------|
| `AI21-Jamba-1.5-Mini` | ❌ |
| `aya-expanse-32b` | ✅ (as Aya-Expanse-32B) |

#### Cohere for AI: Aya
| Model | Configuration |
|:------|:-------------|
| `aya-expanse-32b` | ✅ |

#### OpenAI: GPT-2
| Model | Configuration |
|:------|:-------------|
| `gpt2-large` | ❌ |
| `gpt2-xl` | ❌ |
| `gpt-oss-120b` | ❌ |
| `instructblip-vicuna-7b` | ❌ |

#### InternLM: InternLM2
| Model | Configuration |
|:------|:-------------|
| `internlm2-math-plus-7b` | ❌ |

#### Janus
| Model | Configuration |
|:------|:-------------|
| `Janus-Pro-7B` | ❌ |

#### Moonshot AI: Kimi
| Model | Configuration |
|:------|:-------------|
| `Kimi-K2-Instruct` | ❌ |

#### Mistral AI: Ministral
| Model | Configuration |
|:------|:-------------|
| `Ministral-8B-Instruct-2410` | ❌ |
| `Molmo-7B-D-0924` | ✅ |

#### AI2: OLMo
| Model | Configuration |
|:------|:-------------|
| `OLMo-1B-hf` | ❌ |
| `OLMo-7B-hf` | ❌ |
| `OLMo-7B-SFT` | ❌ |

#### EleutherAI: Pythia
| Model | Configuration |
|:------|:-------------|
| `pythia` | ❌ |

#### Qwen: Qwen1.5
| Model | Configuration |
|:------|:-------------|
| `Qwen1.5-72B-Chat` | ❌ |

#### ReasonFlux
| Model | Configuration |
|:------|:-------------|
| `ReasonFlux-PRM-7B` | ❌ |

#### LMSYS: Vicuna
| Model | Configuration |
|:------|:-------------|
| `vicuna-13b-v1.5` | ❌ |

#### Google: T5 (Encoder-Decoder Models)
**Note**: These are encoder-decoder (T5) models, not decoder-only LLMs.
| Model | Configuration |
|:------|:-------------|
| `t5-large-lm-adapt` | ❌ |
| `t5-xl-lm-adapt` | ❌ |
| `mt5-xl-lm-adapt` | ❌ |
Expand All @@ -238,10 +297,10 @@ This document tracks all model weights available in the `/model-weights` directo
### Meta: Llama 3.2 Vision
| Model | Configuration |
|:------|:-------------|
| `Llama-3.2-11B-Vision` | |
| `Llama-3.2-11B-Vision-Instruct` | ✅ |
| `Llama-3.2-90B-Vision` | |
| `Llama-3.2-90B-Vision-Instruct` | ✅ |
| `Llama-3.2-11B-Vision` | |
| `Llama-3.2-11B-Vision-Instruct` | ✅ | (SGLang only)
| `Llama-3.2-90B-Vision` | |
| `Llama-3.2-90B-Vision-Instruct` | ✅ | (SGLang only)

### Mistral: Pixtral
| Model | Configuration |
Expand All @@ -266,10 +325,19 @@ This document tracks all model weights available in the `/model-weights` directo
| `deepseek-vl2` | ✅ |
| `deepseek-vl2-small` | ✅ |

### Google: MedGemma
| Model | Configuration |
|:------|:-------------|
| `medgemma-4b-it` | ✅ |
| `medgemma-27b-it` | ✅ |
| `medgemma-27b-text-it` | ❌ |

### Other VLM Models
| Model | Configuration |
|:------|:-------------|
| `instructblip-vicuna-7b` | ❌ |
| `MiniCPM-Llama3-V-2_5` | ❌ |
| `Molmo-7B-D-0924` | ✅ |

---

Expand Down Expand Up @@ -298,6 +366,8 @@ This document tracks all model weights available in the `/model-weights` directo
| `data2vec` | ❌ |
| `gte-modernbert-base` | ❌ |
| `gte-Qwen2-7B-instruct` | ❌ |
| `KaLM-Embedding-Gemma3-12B-2511` | ❌ |
| `llama-embed-nemotron-8b` | ❌ |
| `m2-bert-80M-32k-retrieval` | ❌ |
| `m2-bert-80M-8k-retrieval` | ❌ |

Expand All @@ -313,7 +383,7 @@ This document tracks all model weights available in the `/model-weights` directo

---

## Multimodal Models
## Vision Models

### CLIP
| Model | Configuration |
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
[![vLLM](https://img.shields.io/badge/vLLM-0.11.0-blue)](https://docs.vllm.ai/en/v0.11.0/)
[![vLLM](https://img.shields.io/badge/vLLM-0.12.0-blue)](https://docs.vllm.ai/en/v0.12.0/)
[![SGLang](https://img.shields.io/badge/SGLang-0.5.5.post3-blue)](https://docs.sglang.io/index.html)
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)

This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).

**NOTE**: Supported models on Killarney are tracked [here](./MODEL_TRACKING.md)

Expand All @@ -20,12 +21,12 @@ If you are using the Vector cluster environment, and you don't need any customiz
```bash
pip install vec-inf
```
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
Otherwise, we recommend using the provided [`vllm.Dockerfile`](vllm.Dockerfile) and [`sglang.Dockerfile`](sglang.Dockerfile) to set up your own environment with the package. The built images are available through [Docker Hub](https://hub.docker.com/orgs/vectorinstitute/repositories)

If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](vec_inf/config/).
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
* [OPTIONAL] The package could also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.

## Usage

Expand All @@ -42,13 +43,13 @@ vec-inf launch Meta-Llama-3.1-8B-Instruct
```
You should see an output like the following:

<img width="720" alt="launch_image" src="https://github.com/user-attachments/assets/c1e0c60c-cf7a-49ed-a426-fdb38ebf88ee" />
<img width="720" alt="launch_image" src="./docs/assets/launch.png" />

**NOTE**: You can set the required fields in the environment configuration (`environment.yaml`), it's a mapping between required arguments and their corresponding environment variables. On the Vector **Killarney** Cluster environment, the required fields are:
* `--account`, `-A`: The Slurm account, this argument can be set to default by setting environment variable `VEC_INF_ACCOUNT`.
* `--work-dir`, `-D`: A working directory other than your home directory, this argument can be set to default by seeting environment variable `VEC_INF_WORK_DIR`.

Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is [supported by vLLM](https://docs.vllm.ai/en/stable/models/supported_models.html). For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command)
Models that are already supported by `vec-inf` would be launched using the cached configuration (set in [slurm_vars.py](vec_inf/client/slurm_vars.py)) or [default configuration](vec_inf/config/models.yaml). You can override these values by providing additional parameters. Use `vec-inf launch --help` to see the full list of parameters that can be overriden. You can also launch your own custom model as long as the model architecture is supported by the underlying inference engine. For detailed instructions on how to customize your model launch, check out the [`launch` command section in User Guide](https://vectorinstitute.github.io/vector-inference/latest/user_guide/#launch-command)

#### Other commands

Expand Down
Binary file added docs/assets/launch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Vector Inference: Easy inference on Slurm clusters

This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/stable/). **This package runs natively on the Vector Institute cluster environment**. To adapt to other environments, follow the instructions in [Installation](#installation).
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using open-source inference engines ([vLLM](https://docs.vllm.ai/en/v0.12.0/), [SGLang](https://docs.sglang.io/index.html)). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).


**NOTE**: Supported models on Killarney are tracked [here](https://github.com/VectorInstitute/vector-inference/blob/main/MODEL_TRACKING.md)

Expand All @@ -12,9 +13,9 @@ If you are using the Vector cluster environment, and you don't need any customiz
pip install vec-inf
```

Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
Otherwise, we recommend using the provided [`vllm.Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/vllm.Dockerfile) and [`sglang.Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/sglang.Dockerfile) to set up your own environment with the package. The built images are available through [Docker Hub](https://hub.docker.com/orgs/vectorinstitute/repositories)

If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config), then install from source by running `pip install .`.
* The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to `/model-weights/vec-inf-shared`, you would need to create an `environment.yaml` and a `models.yaml` following the format of these files in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config).
* The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
* [OPTIONAL] The package would also look for an enviroment variable `VEC_INF_CONFIG_DIR`. You can put your `environment.yaml` and `models.yaml` in a directory of your choice and set the enviroment variable `VEC_INF_CONFIG_DIR` to point to that location.
Loading