Skip to content

Commit 13fa718

Browse files
Merge pull request #269 from runpod-workers/feat/allow-engine-args-env
feat: allow all AsyncEngineArgs as env vars
2 parents 407dbd7 + 8a9365b commit 13fa718

File tree

4 files changed

+222
-116
lines changed

4 files changed

+222
-116
lines changed

.runpod/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ All behaviour is controlled through environment variables:
2828
| `OPENAI_SERVED_MODEL_NAME_OVERRIDE` | Override served model name in API | | String |
2929
| `MAX_CONCURRENCY` | Maximum concurrent requests | 300 | Integer |
3030

31+
**Pass any vLLM engine arg** not listed above by setting an env var with the **UPPERCASED** field name (e.g. `MAX_MODEL_LEN=4096`, `ENABLE_CHUNKED_PREFILL=true`). The worker auto-discovers all `AsyncEngineArgs` fields from env. See the [vLLM engine args docs](https://docs.vllm.ai/en/latest/configuration/engine_args) for all available options.
32+
3133
For complete configuration options, see the [full configuration documentation](https://github.com/runpod-workers/worker-vllm/blob/main/docs/configuration.md).
3234

3335
## API Usage

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,16 @@ Configure worker-vllm using environment variables:
5959
| `OPENAI_SERVED_MODEL_NAME_OVERRIDE` | Override served model name in API | | String |
6060
| `MAX_CONCURRENCY` | Maximum concurrent requests | 30 | Integer |
6161

62+
**Pass any vLLM engine arg** not listed above by setting an environment variable with the **UPPERCASED** field name (same names vLLM uses). The worker auto-discovers all `AsyncEngineArgs` fields from env. For example:
63+
64+
| Environment Variable | vLLM Engine Arg | Example Value |
65+
| ------------------------- | ------------------------ | ------------- |
66+
| `MAX_MODEL_LEN` | `max_model_len` | `4096` |
67+
| `ENFORCE_EAGER` | `enforce_eager` | `true` |
68+
| `ENABLE_CHUNKED_PREFILL` | `enable_chunked_prefill` | `true` |
69+
70+
Any env var whose name matches a valid `AsyncEngineArgs` field (uppercased) is applied automatically. Backward-compat aliases: `MODEL_NAME`, `TOKENIZER_NAME`, `MAX_CONTEXT_LEN_TO_CAPTURE`. This lets you configure any vLLM option without waiting for explicit worker support.
71+
6272
For the complete list of all available environment variables, examples, and detailed descriptions: **[Configuration](docs/configuration.md)**
6373

6474
## Option 2: Build Docker Image with Model Inside

docs/configuration.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,29 @@ The way this works is that the first request will have a batch size of `DEFAULT_
156156
| `DISABLE_LOGGING_REQUEST` | False | `bool` | Disable logging requests. |
157157
| `MAX_LOG_LEN` | None | `int` | Max number of prompt characters or prompt ID numbers being printed in log. |
158158

159+
## UPPERCASED env vars: Pass any engine arg
160+
161+
Any vLLM `AsyncEngineArgs` field can be set via an environment variable using the **UPPERCASED** field name (the same names vLLM uses). The worker auto-discovers all fields from env — no prefix.
162+
163+
**Format:** `<FIELD_NAME_UPPERCASED>=<value>` (e.g. `MAX_MODEL_LEN=4096`)
164+
165+
**Examples:**
166+
167+
| Environment Variable | vLLM Engine Arg | Value Example |
168+
| ------------------------ | ------------------------ | ------------- |
169+
| `MAX_MODEL_LEN` | `max_model_len` | `4096` |
170+
| `ENFORCE_EAGER` | `enforce_eager` | `true` |
171+
| `ENABLE_CHUNKED_PREFILL` | `enable_chunked_prefill` | `true` |
172+
| `NUM_SCHEDULER_STEPS` | `num_scheduler_steps` | `8` |
173+
| `TOKENIZER_POOL_SIZE` | `tokenizer_pool_size` | `4` |
174+
175+
**Backward-compat aliases:** `MODEL_NAME``model`, `TOKENIZER_NAME``tokenizer`, `MAX_CONTEXT_LEN_TO_CAPTURE``max_seq_len_to_capture`, `MODEL_REVISION``revision`.
176+
177+
**Notes:**
178+
- Only valid `AsyncEngineArgs` fields are applied. Unknown keys are silently ignored.
179+
- Values are automatically cast to the correct type (`int`, `float`, `bool`, `str`, or JSON for `dict`/`list`/`tuple`).
180+
- For a full list of available engine args, see the [vLLM AsyncEngineArgs documentation](https://docs.vllm.ai/en/latest/configuration/engine_args/).
181+
159182
## Docker Build Arguments
160183

161184
These variables are used when building custom Docker images with models baked in:

0 commit comments

Comments
 (0)