Add CLI options for backend args (like headers and verify) (vllm-project#230)

kdelee · Harshith-umesh · commit fec23c061e22 · 2025-08-01T13:49:02.000-04:00
This PR adds the ability to configure custom request headers and control
SSL certificate verification when running benchmarks.

* The OpenAIHTTPBackend now supports passing custom headers and a verify
flag to disable SSL verification.
* Headers are now merged with the following precedence: CLI arguments
(--backend-args), scenario file arguments, environment variables, and
then default values.
* Headers can be removed by setting their value to null in the
--backend-args JSON string.
* The --backend-args help text has been updated with an example of how
to use these new features.
* New documentation has been added for the CLI, configuration options,
and supported data formats.
* Unit tests have been added to verify the new header and SSL
verification logic, as well as the CLI argument parsing.

This provides a way to benchmark targets that require custom
authentication, other headers, or use self-signed SSL certificates.

Signed-off-by: Elijah DeLee &lt;kdelee@redhat.com&gt;
diff --git a/docs/guides/cli.md b/docs/guides/cli.md
@@ -1 +1,36 @@
-# Coming Soon
+# CLI Reference
+
+This page provides a reference for the `guidellm` command-line interface. For more advanced configuration, including environment variables and `.env` files, see the [Configuration Guide](./configuration.md).
+
+## `guidellm benchmark run`
+
+This command is the primary entrypoint for running benchmarks. It has many options that can be specified on the command line or in a scenario file.
+
+### Scenario Configuration
+
+| Option                      | Description                                                                                                                                     |
+| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
+| `--scenario <PATH or NAME>` | The name of a builtin scenario or path to a scenario configuration file. Options specified on the command line will override the scenario file. |
+
+### Target and Backend Configuration
+
+These options configure how `guidellm` connects to the system under test.
+
+| Option                  | Description                                                                                                                                                                                                   |
+| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `--target <URL>`        | **Required.** The endpoint of the target system, e.g., `http://localhost:8080`. Can also be set with the `GUIDELLM__OPENAI__BASE_URL` environment variable.                                                   |
+| `--backend-type <TYPE>` | The type of backend to use. Defaults to `openai_http`.                                                                                                                                                        |
+| `--backend-args <JSON>` | A JSON string for backend-specific arguments. For example: `--backend-args '{"headers": {"Authorization": "Bearer my-token"}, "verify": false}'` to pass custom headers and disable certificate verification. |
+| `--model <NAME>`        | The ID of the model to benchmark within the backend.                                                                                                                                                          |
+
+### Data and Request Configuration
+
+These options define the data to be used for benchmarking and how requests will be generated.
+
+| Option                    | Description                                                                                                                                                                              |
+| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `--data <SOURCE>`         | The data source. This can be a HuggingFace dataset ID, a path to a local data file, or a synthetic data configuration. See the [Data Formats Guide](./data_formats.md) for more details. |
+| `--rate-type <TYPE>`      | The type of request generation strategy to use (e.g., `constant`, `poisson`, `sweep`).                                                                                                   |
+| `--rate <NUMBER>`         | The rate of requests per second for `constant` or `poisson` strategies, or the number of steps for a `sweep`.                                                                            |
+| `--max-requests <NUMBER>` | The maximum number of requests to run for each benchmark.                                                                                                                                |
+| `--max-seconds <NUMBER>`  | The maximum number of seconds to run each benchmark for.                                                                                                                                 |
diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md
@@ -1 +1,59 @@
-# Coming Soon
+# Configuration
+
+The `guidellm` application can be configured using command-line arguments, environment variables, or a `.env` file. This page details the file-based and environment variable configuration options.
+
+## Configuration Methods
+
+Settings are loaded with the following priority (highest priority first):
+
+1. Command-line arguments.
+2. Environment variables.
+3. Values in a `.env` file in the directory where the command is run.
+4. Default values.
+
+## Environment Variable Format
+
+All settings can be configured using environment variables. The variables must be prefixed with `GUIDELLM__`, and nested settings are separated by a double underscore `__`.
+
+For example, to set the `api_key` for the `openai` backend, you would use the following environment variable:
+
+```bash
+export GUIDELLM__OPENAI__API_KEY="your-api-key"
+```
+
+### Target and Backend Configuration
+
+You can configure the connection to the target system using environment variables. This is an alternative to using the `--target-*` command-line flags.
+
+| Environment Variable                  | Description                                                                                                                | Example                                                                   |
+| ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
+| `GUIDELLM__OPENAI__BASE_URL`          | The endpoint of the target system. Equivalent to the `--target` CLI option.                                                | `export GUIDELLM__OPENAI__BASE_URL="http://localhost:8080"`               |
+| `GUIDELLM__OPENAI__API_KEY`           | The API key to use for bearer token authentication.                                                                        | `export GUIDELLM__OPENAI__API_KEY="your-secret-api-key"`                  |
+| `GUIDELLM__OPENAI__BEARER_TOKEN`      | The full bearer token to use for authentication.                                                                           | `export GUIDELLM__OPENAI__BEARER_TOKEN="Bearer your-secret-token"`        |
+| `GUIDELLM__OPENAI__HEADERS`           | A JSON string representing a dictionary of headers to send to the target. These headers will override any default headers. | `export GUIDELLM__OPENAI__HEADERS='{"Authorization": "Bearer my-token"}'` |
+| `GUIDELLM__OPENAI__ORGANIZATION`      | The OpenAI organization to use for requests.                                                                               | `export GUIDELLM__OPENAI__ORGANIZATION="org-12345"`                       |
+| `GUIDELLM__OPENAI__PROJECT`           | The OpenAI project to use for requests.                                                                                    | `export GUIDELLM__OPENAI__PROJECT="proj-67890"`                           |
+| `GUIDELLM__OPENAI__VERIFY`            | Set to `false` or `0` to disable certificate verification.                                                                 | `export GUIDELLM__OPENAI__VERIFY=false`                                   |
+| `GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS` | The default maximum number of tokens to request for completions.                                                           | `export GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS=2048`                         |
+
+### General HTTP Settings
+
+These settings control the behavior of the underlying HTTP client.
+
+| Environment Variable                 | Description                                                                     |
+| ------------------------------------ | ------------------------------------------------------------------------------- |
+| `GUIDELLM__REQUEST_TIMEOUT`          | The timeout in seconds for HTTP requests. Defaults to 300.                      |
+| `GUIDELLM__REQUEST_HTTP2`            | Set to `true` or `1` to enable HTTP/2 support. Defaults to true.                |
+| `GUIDELLM__REQUEST_FOLLOW_REDIRECTS` | Set to `true` or `1` to allow the client to follow redirects. Defaults to true. |
+
+### Using a `.env` file
+
+You can also place these variables in a `.env` file in your project's root directory:
+
+```dotenv
+# .env file
+GUIDELLM__OPENAI__BASE_URL="http://localhost:8080"
+GUIDELLM__OPENAI__API_KEY="your-api-key"
+GUIDELLM__OPENAI__HEADERS='{"Authorization": "Bearer my-token"}'
+GUIDELLM__OPENAI__VERIFY=false
+```
diff --git a/docs/guides/data_formats.md b/docs/guides/data_formats.md
@@ -0,0 +1,67 @@
+# Data Formats
+
+The `--data` argument for the `guidellm benchmark run` command accepts several different formats for specifying the data to be used for benchmarking.
+
+## Local Data Files
+
+You can provide a path to a local data file in one of the following formats:
+
+- **CSV (.csv)**: A comma-separated values file. The loader will attempt to find a column with a common name for the prompt (e.g., `prompt`, `text`, `instruction`).
+- **JSON (.json)**: A JSON file. The structure should be a list of objects, where each object represents a row of data.
+- **JSON Lines (.jsonl)**: A file where each line is a valid JSON object.
+- **Text (.txt)**: A plain text file, where each line is treated as a separate prompt.
+
+If the prompt column cannot be automatically determined, you can specify it using the `--data-args` option:
+
+```bash
+--data-args '{"text_column": "my_custom_prompt_column"}'
+```
+
+## Synthetic Data
+
+You can generate synthetic data on the fly by providing a configuration string or file.
+
+### Configuration Options
+
+| Parameter             | Description                                                                                                     |
+| --------------------- | --------------------------------------------------------------------------------------------------------------- |
+| `prompt_tokens`       | **Required.** The average number of tokens for the generated prompts.                                           |
+| `output_tokens`       | **Required.** The average number of tokens for the generated outputs.                                           |
+| `samples`             | The total number of samples to generate. Defaults to 1000.                                                      |
+| `source`              | The source text to use for generating the synthetic data. Defaults to a built-in copy of "Pride and Prejudice". |
+| `prompt_tokens_stdev` | The standard deviation of the tokens generated for prompts.                                                     |
+| `prompt_tokens_min`   | The minimum number of text tokens generated for prompts.                                                        |
+| `prompt_tokens_max`   | The maximum number of text tokens generated for prompts.                                                        |
+| `output_tokens_stdev` | The standard deviation of the tokens generated for outputs.                                                     |
+| `output_tokens_min`   | The minimum number of text tokens generated for outputs.                                                        |
+| `output_tokens_max`   | The maximum number of text tokens generated for outputs.                                                        |
+
+### Configuration Formats
+
+You can provide the synthetic data configuration in one of three ways:
+
+1. **Key-Value String:**
+
+   ```bash
+   --data "prompt_tokens=256,output_tokens=128,samples=500"
+   ```
+
+2. **JSON String:**
+
+   ```bash
+   --data '{"prompt_tokens": 256, "output_tokens": 128, "samples": 500}'
+   ```
+
+3. **YAML or Config File:** Create a file (e.g., `my_config.yaml`):
+
+   ```yaml
+   prompt_tokens: 256
+   output_tokens: 128
+   samples: 500
+   ```
+
+   And use it with the `--data` argument:
+
+   ```bash
+   --data my_config.yaml
+   ```
diff --git a/src/guidellm/__main__.py b/src/guidellm/__main__.py
@@ -103,7 +103,9 @@ def benchmark():
     default=GenerativeTextScenario.get_default("backend_args"),
     help=(
         "A JSON string containing any arguments to pass to the backend as a "
-        "dict with **kwargs."
+        "dict with **kwargs. Headers can be removed by setting their value to "
+        "null. For example: "
+        """'{"headers": {"Authorization": null, "Custom-Header": "Custom-Value"}}'"""
     ),
 )
 @click.option(
diff --git a/src/guidellm/backend/openai.py b/src/guidellm/backend/openai.py
@@ -95,6 +95,8 @@ def __init__(
         extra_query: Optional[dict] = None,
         extra_body: Optional[dict] = None,
         remove_from_body: Optional[list[str]] = None,
+        headers: Optional[dict] = None,
+        verify: Optional[bool] = None,
     ):
         super().__init__(type_="openai_http")
         self._target = target or settings.openai.base_url
@@ -111,20 +113,40 @@ def __init__(
 
         self._model = model
 
+        # Start with default headers based on other params
+        default_headers: dict[str, str] = {}
         api_key = api_key or settings.openai.api_key
-        self.authorization = (
-            f"Bearer {api_key}" if api_key else settings.openai.bearer_token
-        )
+        bearer_token = settings.openai.bearer_token
+        if api_key:
+            default_headers["Authorization"] = f"Bearer {api_key}"
+        elif bearer_token:
+            default_headers["Authorization"] = bearer_token
 
         self.organization = organization or settings.openai.organization
+        if self.organization:
+            default_headers["OpenAI-Organization"] = self.organization
+
         self.project = project or settings.openai.project
+        if self.project:
+            default_headers["OpenAI-Project"] = self.project
+
+        # User-provided headers from kwargs or settings override defaults
+        merged_headers = default_headers.copy()
+        merged_headers.update(settings.openai.headers or {})
+        if headers:
+            merged_headers.update(headers)
+
+        # Remove headers with None values for backward compatibility and convenience
+        self.headers = {k: v for k, v in merged_headers.items() if v is not None}
+
         self.timeout = timeout if timeout is not None else settings.request_timeout
         self.http2 = http2 if http2 is not None else settings.request_http2
         self.follow_redirects = (
             follow_redirects
             if follow_redirects is not None
             else settings.request_follow_redirects
         )
+        self.verify = verify if verify is not None else settings.openai.verify
         self.max_output_tokens = (
             max_output_tokens
             if max_output_tokens is not None
@@ -161,9 +183,7 @@ def info(self) -> dict[str, Any]:
             "timeout": self.timeout,
             "http2": self.http2,
             "follow_redirects": self.follow_redirects,
-            "authorization": bool(self.authorization),
-            "organization": self.organization,
-            "project": self.project,
+            "headers": self.headers,
             "text_completions_path": TEXT_COMPLETIONS_PATH,
             "chat_completions_path": CHAT_COMPLETIONS_PATH,
         }
@@ -384,6 +404,7 @@ def _get_async_client(self) -> httpx.AsyncClient:
                 http2=self.http2,
                 timeout=self.timeout,
                 follow_redirects=self.follow_redirects,
+                verify=self.verify,
             )
             self._async_client = client
         else:
@@ -395,16 +416,7 @@ def _headers(self) -> dict[str, str]:
         headers = {
             "Content-Type": "application/json",
         }
-
-        if self.authorization:
-            headers["Authorization"] = self.authorization
-
-        if self.organization:
-            headers["OpenAI-Organization"] = self.organization
-
-        if self.project:
-            headers["OpenAI-Project"] = self.project
-
+        headers.update(self.headers)
         return headers
 
     def _params(self, endpoint_type: EndpointType) -> dict[str, str]:
diff --git a/src/guidellm/config.py b/src/guidellm/config.py
@@ -81,10 +81,12 @@ class OpenAISettings(BaseModel):
 
     api_key: Optional[str] = None
     bearer_token: Optional[str] = None
+    headers: Optional[dict[str, str]] = None
     organization: Optional[str] = None
     project: Optional[str] = None
     base_url: str = "http://localhost:8000"
     max_output_tokens: int = 16384
+    verify: bool = True
 
 
 class ReportGenerationSettings(BaseModel):
diff --git a/tests/unit/backend/test_openai_backend.py b/tests/unit/backend/test_openai_backend.py
@@ -11,7 +11,7 @@ def test_openai_http_backend_default_initialization():
     backend = OpenAIHTTPBackend()
     assert backend.target == settings.openai.base_url
     assert backend.model is None
-    assert backend.authorization == settings.openai.bearer_token
+    assert backend.headers.get("Authorization") == settings.openai.bearer_token
     assert backend.organization == settings.openai.organization
     assert backend.project == settings.openai.project
     assert backend.timeout == settings.request_timeout
@@ -37,7 +37,7 @@ def test_openai_http_backend_intialization():
     )
     assert backend.target == "http://test-target"
     assert backend.model == "test-model"
-    assert backend.authorization == "Bearer test-key"
+    assert backend.headers.get("Authorization") == "Bearer test-key"
     assert backend.organization == "test-org"
     assert backend.project == "test-proj"
     assert backend.timeout == 10
diff --git a/tests/unit/backend/test_openai_backend_custom_configs.py b/tests/unit/backend/test_openai_backend_custom_configs.py
diff --git a/tests/unit/test_config.py b/tests/unit/test_config.py
diff --git a/tests/unit/test_main.py b/tests/unit/test_main.py