Skip to content
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
5b45943
enable VllmDeployer to fail fast if the underying vllm process failed.
wangshangsam Dec 10, 2025
bad5387
example slurm script for submitting jobs
wangshangsam Dec 10, 2025
08b32cc
fix slurm scripts
wangshangsam Dec 11, 2025
1cdf563
small fix
wangshangsam Dec 11, 2025
d9caddc
[Automated Commit] Format Codebase
github-actions[bot] Dec 11, 2025
6f62339
Update the readme about the example slurm scripts.
wangshangsam Dec 11, 2025
88b34a4
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 11, 2025
59dc167
Change the default endpoint startup timeout to 1 hour in case someone…
wangshangsam Dec 11, 2025
d9c0bcc
change servr expected qps and target latency
johncalesp Dec 11, 2025
a75dc68
Change the default dataset repo_id to the new name of the public dataset
wangshangsam Dec 12, 2025
866eba9
[Automated Commit] Format Codebase
github-actions[bot] Dec 12, 2025
a8a8870
evaluate the json file with multiprocess
johncalesp Dec 12, 2025
9f3b52e
[Automated Commit] Format Codebase
github-actions[bot] Dec 12, 2025
0342909
change default server_target_latency to 12
wangshangsam Dec 12, 2025
7576e0c
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 12, 2025
d10d634
revert evaluation changeS
johncalesp Dec 12, 2025
e75a34a
[Automated Commit] Format Codebase
github-actions[bot] Dec 12, 2025
2209ae6
update slurm script
wangshangsam Dec 14, 2025
1450143
update slurm script
wangshangsam Dec 15, 2025
6a5f17d
revert evaluation.py changes after analysing the discrepancy in is_se…
johncalesp Dec 15, 2025
d5d2cc8
[Automated Commit] Format Codebase
github-actions[bot] Dec 15, 2025
f72d82d
linting
wangshangsam Dec 16, 2025
0e731ed
[Automated Commit] Format Codebase
github-actions[bot] Dec 16, 2025
4771f13
lock in model and dataset SHA
wangshangsam Dec 16, 2025
55a8cf1
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 16, 2025
d4d6f78
[Automated Commit] Format Codebase
github-actions[bot] Dec 16, 2025
c0d0925
Specify model quality target and server target latency in the README
wangshangsam Dec 16, 2025
e2adf60
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 16, 2025
7dabbfe
Update loadgen/mlperf.conf
wangshangsam Dec 18, 2025
423cea4
aligning TestSettings'C++ code with its python binding
wangshangsam Dec 18, 2025
817f0e8
[Automated Commit] Format Codebase
github-actions[bot] Dec 18, 2025
9d3b36b
remove ttft and tpot from mlperf.conf
wangshangsam Dec 18, 2025
29e7c1a
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 18, 2025
95f4179
Enable CLI to take in user.conf
wangshangsam Dec 18, 2025
5370ecd
[Automated Commit] Format Codebase
github-actions[bot] Dec 18, 2025
f9d983f
readme
wangshangsam Dec 19, 2025
897894d
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 19, 2025
8f8e886
Merge branch 'master' into wangshangsam/fix-req-timeout
wangshangsam Dec 19, 2025
f8e6bf8
readme
wangshangsam Dec 19, 2025
8bfbeb9
rename vl2l -> q3vl
wangshangsam Dec 19, 2025
b589ddd
[Automated Commit] Format Codebase
github-actions[bot] Dec 19, 2025
3b065ee
empty
wangshangsam Dec 19, 2025
eb65590
rerun ci
wangshangsam Dec 19, 2025
38ff6f9
rerun ci
wangshangsam Dec 19, 2025
c1534ae
Introduce sampling parameters
wangshangsam Dec 20, 2025
472471f
[Automated Commit] Format Codebase
github-actions[bot] Dec 20, 2025
e9117a7
Merge branch 'master' into wangshangsam/fix-req-timeout
wangshangsam Dec 22, 2025
1b04e7b
[Automated Commit] Format Codebase
github-actions[bot] Dec 22, 2025
4c66f1c
empty
wangshangsam Dec 22, 2025
69c8b08
move CFLAGS="-std=c++14 -O3" into extra_compile_args of Pybind11Exten…
wangshangsam Dec 22, 2025
c24d286
[Automated Commit] Format Codebase
github-actions[bot] Dec 22, 2025
f24a6a9
enable specifying loadgen source in the Dockerfile
wangshangsam Dec 22, 2025
bc1449a
Merge branch 'wangshangsam/fix-req-timeout' of github.com:CentML/mlpe…
wangshangsam Dec 22, 2025
8a517cd
update slurm scripts
wangshangsam Dec 22, 2025
deb6dd0
Maintain None as the default value for the sampling params
wangshangsam Dec 22, 2025
3e55d26
[Automated Commit] Format Codebase
github-actions[bot] Dec 22, 2025
8fa86ab
update readme
wangshangsam Dec 22, 2025
f859932
Merge branch 'master' into wangshangsam/fix-req-timeout
wangshangsam Dec 22, 2025
8c600ce
[Automated Commit] Format Codebase
github-actions[bot] Dec 22, 2025
ff8a727
empty
wangshangsam Dec 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions multimodal/vl2l/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,34 @@ mlperf-inf-mm-vl2l benchmark vllm \
--vllm.cli=--tensor-parallel-size=8
```

## Slurm

[scripts/slurm/](scripts/slurm/) provide example scripts of running both the benchmark
and the response quality evaluation in a GPU cluster managed by
[Slurm](https://slurm.schedmd.com/) with [enroot](https://github.com/nvidia/enroot) and
[pyxis](https://github.com/NVIDIA/pyxis). Specifically,

- [scripts/slurm/benchmark.sh](scripts/slurm/benchmark.sh) is a sbatch script that
runs the benchmarking job.
- [scripts/slurm/evaluate.sh](scripts/slurm/evaluate.sh) is a sbatch script that runs
the evaluation job.
- [scripts/slurm/submit.sh](scripts/slurm/submit.sh) is a Bash script that submits both
jobs, where the evaluation job would only run if the benchmarking job has succeeded.

You can check the CLI flags that [scripts/slurm/submit.sh](scripts/slurm/submit.sh) can
take via:

```bash
bash submit.sh --help
```

> [!NOTE]
> Slurm clusters are often highly customized per organization. If you are unfamiliar
> with Slurm, you should check with the cluster administrator of your organization
> first, get a good understanding of what those example scripts do, and adapt the
> example scripts to the specific settings for the Slurm cluster that you are going
> to use, before you try to launch any jobs.

## Developer Guide

### Linting
Expand Down
29 changes: 29 additions & 0 deletions multimodal/vl2l/scripts/slurm/benchmark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash
#SBATCH --time=4:00:00
#SBATCH --partition=batch
#SBATCH --tasks=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
#SBATCH --output=benchmark-slurm-output-%j.txt
#SBATCH --error=benchmark-slurm-error-%j.txt

set -eux
set -o pipefail

mkdir -p ${OUTPUT_HOST_DIR}/${SLURM_JOB_ID}

srun \
--container-image=${CONTAINER_IMAGE} \
--container-mounts=${CACHE_HOST_DIR}:${CACHE_CONTAINER_DIR},${OUTPUT_HOST_DIR}:${OUTPUT_CONTAINER_DIR} \
--no-container-mount-home \
mlperf-inf-mm-vl2l benchmark vllm \
--settings.test.scenario=${SCENARIO} \
--settings.test.mode=${MODE} \
--settings.test.server_expected_qps=${SERVER_EXPECTED_QPS} \
--vllm.model.repo_id=${MODEL_REPO_ID} \
--vllm.cli=--async-scheduling \
--vllm.cli=--max-model-len=32768 \
--vllm.cli=--limit-mm-per-prompt.video=0 \
--vllm.cli=--tensor-parallel-size=${TENSOR_PARALLEL_SIZE} \
--settings.logging.log_output.outdir=${OUTPUT_CONTAINER_DIR}/${SLURM_JOB_ID}
21 changes: 21 additions & 0 deletions multimodal/vl2l/scripts/slurm/evaluate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --partition=cpu_short
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=16G
#SBATCH --output=evaluate-slurm-output-%j.txt
#SBATCH --error=evaluate-slurm-error-%j.txt

set -eux
set -p pipefail

srun \
--container-image=${CONTAINER_IMAGE} \
--container-mounts=${CACHE_HOST_DIR}:${CACHE_CONTAINER_DIR},${OUTPUT_HOST_DIR}:${OUTPUT_CONTAINER_DIR} \
--no-container-mount-home \
--container-env=NVIDIA_VISIBLE_DEVICES \
mlperf-inf-mm-vl2l evaluate \
--filename=${OUTPUT_CONTAINER_DIR}/${BENCHMARK_JOB_ID}/mlperf_log_accuracy.json
229 changes: 229 additions & 0 deletions multimodal/vl2l/scripts/slurm/submit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
#!/bin/bash

set -eux
set -o pipefail

DEFAULT_CONTAINER_IMAGE=""
container_image=${DEFAULT_CONTAINER_IMAGE}

DEFAULT_MODEL_REPO_ID=Qwen/Qwen3-VL-235B-A22B-Instruct
model_repo_id=${DEFAULT_MODEL_REPO_ID}

DEFAULT_SCENARIO=offline
scenario=${DEFAULT_SCENARIO}

DEFAULT_MODE=accuracy_only
mode=${DEFAULT_MODE}

DEFAULT_SERVER_EXPECTED_QPS=5
server_expected_qps=${DEFAULT_SERVER_EXPECTED_QPS}

DEFAULT_TENSOR_PARALLEL_SIZE=8
tensor_parallel_size=${DEFAULT_TENSOR_PARALLEL_SIZE}

DEFAULT_CACHE_HOST_DIR=""
cache_host_dir=${DEFAULT_CACHE_HOST_DIR}

DEFAULT_OUTPUT_HOST_DIR=$(pwd)/outputs
output_host_dir=${DEFAULT_OUTPUT_HOST_DIR}

DEFAULT_SLURM_ACCOUNT=""
slurm_account=${DEFAULT_SLURM_ACCOUNT}

DEFAULT_BENCHMARK_SLURM_PARTITION=""
benchmark_slurm_partition=${DEFAULT_BENCHMARK_SLURM_PARTITION}

DEFAULT_EVALUATE_SLURM_PARTITION=""
evaluate_slurm_partition=${DEFAULT_EVALUATE_SLURM_PARTITION}

function _exit_with_help_msg() {
cat <<EOF
Submit a benchmarking (and optionally, an evaluation) job(s) for the VL2L benchmark.

Usage: ${BASH_SOURCE[0]}
[-ci | --container-image] Container image to run the benchmark (default: ${DEFAULT_CONTAINER_IMAGE}).
[-mri | --model-repo-id] HuggingFace repo ID of the model to benchmark (default: ${DEFAULT_MODEL_REPO_ID}).
[-s | --scenario] Benchmark scenario (default: ${DEFAULT_SCENARIO}).
[-m | --mode] Benchmark mode (default: ${DEFAULT_MODE}).
[-seq | --server-expected-qps] The expected QPS for the server scenario (default: ${DEFAULT_SERVER_EXPECTED_QPS}).
[-tps | --tensor-parallel-size] Tensor parallelism size for the model deployment (default: ${DEFAULT_TENSOR_PARALLEL_SIZE}).
[-chd | --cache-host-dir] Host directory of the `.cache` directory to which HuggingFace will dump the dataset and the model checkpoint, and vLLM will dump compilation artifacts (default: ${DEFAULT_CACHE_HOST_DIR}).
[-ohd | --output-host-dir] Host directory to which the benchmark and evaluation results will be dumped (default: ${DEFAULT_OUTPUT_HOST_DIR}).
[-sa | --slurm-account] Slurm account for submitting the benchmark and evaluation jobs (default: ${DEFAULT_SLURM_ACCOUNT}).
[-bsp | --benchmark-slurm-partition] Slurm partition for submitting the benchmarking job; usually a partition with nodes that have GPUs (default: ${DEFAULT_BENCHMARK_SLURM_PARTITION}).
[-esp | --evaluate-slurm-partition] Slurm partition for submitting the evaluation job; usually a partition with nodes that have CPUs only (default: ${DEFAULT_EVALUATE_SLURM_PARTITION}).
[-h | --help] Print this help message.
EOF
if [ -n "$1" ]; then
echo "$(tput bold setab 1)$1$(tput sgr0)"
fi
exit "$2"
}

while [[ $# -gt 0 ]]; do
case $1 in
-ci | --container-image)
container_image=$2
shift
shift
;;
-ci=* | --container-image=*)
container_image=${1#*=}
shift
;;
-mri | --model-repo-id)
model_repo_id=$2
shift
shift
;;
-mri=* | --model-repo-id=*)
model_repo_id=${1#*=}
shift
;;
-s | --scenario)
scenario=$2
shift
shift
;;
-s=* | --scenario=*)
scenario=${1#*=}
shift
;;
-m | --mode)
mode=$2
shift
shift
;;
-m=* | --mode=*)
mode=${1#*=}
shift
;;
-seq | --server-expected-qps)
server_expected_qps=$2
shift
shift
;;
-seq=* | --server-expected-qps=*)
server_expected_qps=${1#*=}
shift
;;
-tps | --tensor-parallel-size)
tensor_parallel_size=$2
shift
shift
;;
-tps=* | --tensor-parallel-size=*)
tensor_parallel_size=${1#*=}
shift
;;
-chd | --cache-host-dir)
cache_host_dir=$2
shift
shift
;;
-chd=* | --cache-host-dir=*)
cache_host_dir=${1#*=}
shift
;;
-ohd | --output-host-dir)
output_host_dir=$2
shift
shift
;;
-ohd=* | --output-host-dir=*)
output_host_dir=${1#*=}
shift
;;
-sa | --slurm-account)
slurm_account=$2
shift
shift
;;
-sa=* | --slurm-account=*)
slurm_account=${1#*=}
shift
;;
-bsp | --benchmark-slurm-partition)
benchmark_slurm_partition=$2
shift
shift
;;
-bsp=* | --benchmark-slurm-partition=*)
benchmark_slurm_partition=${1#*=}
shift
;;
-esp | --evaluate-slurm-partition)
evaluate_slurm_partition=$2
shift
shift
;;
-esp=* | --evaluate-slurm-partition=*)
evaluate_slurm_partition=${1#*=}
shift
;;
-h | --help)
_exit_with_help_msg "" 0
;;
*)
_exit_with_help_msg "[ERROR] Unknown option: $1" 1
;;
esac
done

if [[ -z "${container_image}" ]]; then
_exit_with_help_msg "[ERROR] -ci or --container-image is required." 1
fi

if [[ -z "${cache_host_dir}" ]]; then
_exit_with_help_msg "[ERROR] -chd or --cache-host-dir is required." 1
fi

if [[ -z "${slurm_account}" ]]; then
_exit_with_help_msg "[ERROR] -sa or --slurm-account is required." 1
fi

if [[ -z "${benchmark_slurm_partition}" ]]; then
_exit_with_help_msg "[ERROR] -bsp or --benchmark-slurm-partition is required." 1
fi

if [[ -z "${evaluate_slurm_partition}" ]]; then
_exit_with_help_msg "[ERROR] -esp or --evaluate-slurm-partition is required." 1
fi

cache_container_dir=/root/.cache
output_container_dir=/outputs

mkdir -p "${output_host_dir}"

benchmark_job_id=$(
CACHE_HOST_DIR="${cache_host_dir}" \
CACHE_CONTAINER_DIR="${cache_container_dir}" \
OUTPUT_HOST_DIR="${output_host_dir}" \
OUTPUT_CONTAINER_DIR="${output_container_dir}" \
CONTAINER_IMAGE="${container_image}" \
SCENARIO="${scenario}" \
MODE="${mode}" \
SERVER_EXPECTED_QPS="${server_expected_qps}" \
TENSOR_PARALLEL_SIZE="${tensor_parallel_size}" \
MODEL_REPO_ID="${model_repo_id}" \
sbatch --parsable \
--account="${slurm_account}" \
--partition="${benchmark_slurm_partition}" \
--gres=gpu:"${tensor_parallel_size}" \
benchmark.sh
)

if [[ "${mode}" == "accuracy_only" ]]; then
CACHE_HOST_DIR="${cache_host_dir}" \
CACHE_CONTAINER_DIR="${cache_container_dir}" \
OUTPUT_HOST_DIR="${output_host_dir}" \
OUTPUT_CONTAINER_DIR="${output_container_dir}" \
CONTAINER_IMAGE="${container_image}" \
BENCHMARK_JOB_ID="${benchmark_job_id}" \
NVIDIA_VISIBLE_DEVICES=void \
sbatch \
--dependency=afterok:"${benchmark_job_id}" \
--account="${slurm_account}" \
--partition="${evaluate_slurm_partition}" \
evaluate.sh
fi
Loading