A simple solution for benchmarking vLLM, SGLang, and TensorRT-LLM on Modal. ⏱️
pip install -e .
To run a single benchmark, you can use the provision-and-benchmark
command, which will provision an LLM server, benchmark it, and save the results to a local file.
For example, to run a synchronous (one request after another) benchmark with vLLM and save the results to results.json
:
LLM_SERVER_TYPE=vllm
MODEL=meta-llama/Llama-3.1-8B-Instruct
OUTPUT_PATH=results.json
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH
Or, to run a fixed-rate (e.g. 5 requests per second) multi-GPU benchmark with SGLang:
GPU_COUNT=4
GPU_TYPE=H100
LLM_SERVER_TYPE=sglang
RATE_TYPE=constant
REQUESTS_PER_SECOND=5
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --gpu "$GPU_TYPE:$GPU_COUNT" --rate-type $RATE_TYPE --rate $REQUESTS_PER_SECOND --llm-server-config "{\"extra_args\": [\"--tp-size\", \"$GPU_COUNT\"]}"
Or, to run a throughput (as many requests as the server can handle) test with TensorRT-LLM:
LLM_SERVER_TYPE=tensorrt-llm
RATE_TYPE=throughput
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --rate-type $RATE_TYPE
To profile a server with the PyTorch profiler, use the following command (only vLLM and SGLang are currently supported):
LLM_SERVER_TYPE=vllm
MODEL=meta-llama/Llama-3.1-8B-Instruct
NUM_REQUESTS=10
OUTPUT_PATH=trace.json.gz
stopwatch profile $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --num-requests $NUM_REQUESTS
Once the profiling is done, the trace will be saved to trace.json.gz
, which you can open and visualize at https://ui.perfetto.dev.
Keep in mind that generated traces can get very large, so it is recommended to only send a few requests while profiling.
Before committing any changes, you should make sure that your changes don't break any core functionality in Stopwatch. You may verify this with:
pytest
To make sure that any code changes are compliant with our linting rules, you can run ruff
with:
ruff check
We welcome contributions, including those that add tuned benchmarks to our collection. See the CONTRIBUTING file and the Getting Started document for more details on contributing to Stopwatch.
Stopwatch is available under the MIT license. See the LICENSE file for more details.