ai-dynamo · ajcasagrande · Mar 18, 2026 · coderabbitai · Mar 18, 2026 · coderabbitai
diff --git a/.gitignore b/.gitignore
@@ -35,3 +35,6 @@ src/aiperf/plugin/enums.pyi
 # Cursor - ignore user settings
 .cursor/*
 !.cursor/rules/
+
+# Superpowers
+docs/superpowers/*
diff --git a/README.md b/README.md
@@ -120,6 +120,7 @@ Log File: /home/user/Code/aiperf/artifacts/granite4:350m-openai-chat-concurrency
 - [User Interface](docs/tutorials/ui-types.md) - Dashboard, simple, or headless
 - [Hugging Face TGI](docs/tutorials/huggingface-tgi.md) - Profile Hugging Face TGI models
 - [OpenAI Text Endpoints](docs/tutorials/openai-text-endpoints.md) - Profile OpenAI-compatible text APIs
+- [AWS SigV4 Authentication](docs/tutorials/aws-sigv4-auth.md) - Benchmark AWS endpoints (API Gateway, SageMaker)
 
 ### Load Control and Timing
 - [Request Rate with Max Concurrency](docs/tutorials/request-rate-concurrency.md) - Dual request control

diff --git a/docs/cli-options.md b/docs/cli-options.md
@@ -187,6 +187,23 @@ Transport connection reuse strategy. 'pooled' (default): connections are pooled
 For video generation endpoints, download the video content after generation completes. When enabled, request latency includes the video download time. When disabled (default), only generation time is measured.
 <br/>_Flag (no value required)_
 
+#### `--auth-type` `<str>`
+
+Request signing method for authentication. When set, the selected request_signer plugin signs every HTTP request. Replaces Bearer token auth (--api-key is ignored when --auth-type is set).
+<br/>_Choices: [`sigv4`]_
+
+#### `--aws-region` `<str>`
+
+AWS region for SigV4 request signing (e.g., us-east-1, eu-west-1). Required when --auth-type is sigv4.
+
+#### `--aws-service` `<str>`
+
+AWS service name for SigV4 request signing (e.g., execute-api, sagemaker). Required when --auth-type is sigv4.
+
+#### `--aws-profile` `<str>`
+
+AWS profile name from ~/.aws/credentials for credential lookup. When not set, uses the default boto credential chain (env vars, config file, IAM role, IRSA, SSO).
+
 ### Input
 
 #### `--extra-inputs` `<list>`
@@ -1059,7 +1076,7 @@ Explore AIPerf plugins: aiperf plugins [category] [type]
 #### `--category` `<str>`
 
 Category to explore.
-<br/>_Choices: [`accuracy_benchmark`, `accuracy_grader`, `api_router`, `arrival_pattern`, `communication`, `communication_client`, `console_exporter`, `custom_dataset_loader`, `data_exporter`, `dataset_backing_store`, `dataset_client_store`, `dataset_composer`, `dataset_sampler`, `endpoint`, `gpu_telemetry_collector`, `plot`, `public_dataset_loader`, `ramp`, `record_processor`, `results_processor`, `service`, `service_manager`, `timing_strategy`, `transport`, `ui`, `url_selection_strategy`, `zmq_proxy`]_
+<br/>_Choices: [`accuracy_benchmark`, `accuracy_grader`, `api_router`, `arrival_pattern`, `communication`, `communication_client`, `console_exporter`, `custom_dataset_loader`, `data_exporter`, `dataset_backing_store`, `dataset_client_store`, `dataset_composer`, `dataset_sampler`, `endpoint`, `gpu_telemetry_collector`, `plot`, `public_dataset_loader`, `ramp`, `record_processor`, `request_signer`, `results_processor`, `service`, `service_manager`, `timing_strategy`, `transport`, `ui`, `url_selection_strategy`, `zmq_proxy`]_
 
 #### `--name` `<str>`
 

diff --git a/docs/tutorials/aws-sigv4-auth.md b/docs/tutorials/aws-sigv4-auth.md
@@ -0,0 +1,279 @@
+---
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+sidebar-title: AWS SigV4 Authentication
+---
+
+# Benchmarking AWS Endpoints
+
+This guide walks you through benchmarking inference endpoints protected by AWS IAM authentication. AIPerf signs every request with your AWS credentials automatically -- you just need to tell it your AWS region and service name.
+
+## What's Supported
+
+SigV4 signing works with AWS endpoints that speak the OpenAI API format. Here's what works today:
+
+| Scenario | Non-Streaming | Streaming | Notes |
+|----------|:---:|:---:|-------|
+| API Gateway + vLLM/TGI/NIM | Yes | Yes | Full support -- standard HTTP + SSE |
+| SageMaker + vLLM/LMI container | Yes | No | Non-streaming only. SageMaker uses proprietary event framing instead of SSE. |
+| Bedrock Converse / InvokeModel | No | No | Different request/response schema -- not OpenAI-compatible |
+
+## Before You Start
+
+1. Install the AWS extra (this pulls in `botocore` for credential handling):
+
+```bash
+uv pip install aiperf[aws]
+```
+
+2. Make sure your AWS credentials are working:
+
+```bash
+aws sts get-caller-identity
+```
+
+If that prints your account and role info, you're good to go. If not, see [Setting Up Credentials](#setting-up-credentials).
+
+## Quick Start
+
+The key flags are `--auth-type sigv4`, `--aws-region`, and `--aws-service`. Add these to any `aiperf profile` command and AIPerf will sign every request automatically.
+
+### API Gateway with IAM Auth
+
+Your API Gateway fronts an OpenAI-compatible server and has IAM authorization enabled. Both streaming and non-streaming work:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --streaming \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-count 100
+```
+
+If your API Gateway maps a custom path to the backend, use `--endpoint` to set it:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com \
+    --endpoint /prod/inference/v1/chat/completions \
+    --endpoint-type chat \
+    --streaming \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-count 100
+```
+
+### SageMaker with vLLM or LMI (Non-Streaming)
+
+SageMaker endpoints running vLLM or DJL LMI containers accept OpenAI-format request bodies through the `/invocations` path. The response body is passed through unchanged, so non-streaming works. Use `--endpoint` to set the SageMaker invocation path:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://runtime.sagemaker.us-east-1.amazonaws.com \
+    --endpoint /endpoints/my-endpoint/invocations \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service sagemaker \
+    --request-count 100
+```
+
+Streaming is not supported for SageMaker endpoints because SageMaker uses a proprietary event stream format instead of SSE. Do not pass `--streaming` with SageMaker.
+
+## Figuring Out Your Region and Service Name
+
+The `--aws-region` should match the region in your endpoint URL:
+
+```
+https://abc123.execute-api.us-east-1.amazonaws.com/...
+                           ^^^^^^^^^
+                           this is your --aws-region
+```
+
+The `--aws-service` depends on which AWS service handles your traffic:
+
+| If your traffic goes through... | Use `--aws-service` |
+|--------------------------------|---------------------|
+| API Gateway | `execute-api` |
+| SageMaker Runtime | `sagemaker` |
+
+A common gotcha: the service name isn't always what you'd guess. For example, it's `sagemaker` not `sagemaker-runtime`. If you get a "SignatureDoesNotMatch" error, the service name is the first thing to double-check.
+
+## Setting Up Credentials
+
+If `aws sts get-caller-identity` already works, you can skip this section -- AIPerf will pick up the same credentials automatically.
+
+### Environment Variables (simplest)
+
+Good for quick local testing:
+
+```bash
+export AWS_ACCESS_KEY_ID="AKIA..."
+export AWS_SECRET_ACCESS_KEY="wJal..."
+export AWS_SESSION_TOKEN="FwoG..."  # only if using temporary credentials
+
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-count 100
+```
+
+### Named Profiles (multiple accounts)
+
+If you work with more than one AWS account, you probably already have profiles set up in `~/.aws/credentials`. Point AIPerf at the right one with `--aws-profile`:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-west-2.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-west-2 \
+    --aws-service execute-api \
+    --aws-profile staging \
+    --request-count 100
+```
+
+Without `--aws-profile`, AIPerf uses whichever credentials the AWS CLI would use by default (environment variables first, then `[default]` profile, then IAM roles).
+
+### SSO
+
+If your team uses AWS IAM Identity Center (SSO), log in first, then pass the profile:
+
+```bash
+aws sso login --profile my-sso-profile
+
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --aws-profile my-sso-profile \
+    --request-count 100
+```
+
+### Kubernetes (EKS)
+
+On EKS, credentials are typically injected into your pod automatically via IRSA or Pod Identity. You don't need `--aws-profile` -- just make sure your pod's service account has the right IAM role attached:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-count 1000
+```
+
+One thing to watch: if your pod has `AWS_ACCESS_KEY_ID` set as an environment variable (e.g., from a Kubernetes Secret), that takes priority over IRSA/Pod Identity. If you're hitting the wrong account, check for stale env vars.
+
+## Long-Running Benchmarks
+
+AIPerf refreshes AWS credentials automatically before each request. This means temporary credentials (from SSO, assumed roles, or IRSA) won't expire mid-benchmark. If you're running a long benchmark with thousands of requests, you don't need to do anything special.
+
+The one exception: if your SSO session itself expires (they typically last 8-12 hours), you'll need to re-run `aws sso login` and restart the benchmark.
+
+## Examples
+
+### High-Throughput API Gateway with Warmup
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --streaming \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-rate 50 \
+    --request-count 1000 \
+    --warmup-request-count 20
+```
+
+### Multiple API Gateway Endpoints
+
+Distribute load across two endpoints in the same region:
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --url https://def456.execute-api.us-east-1.amazonaws.com/prod/v1 \
+    --endpoint-type chat \
+    --streaming \
+    --auth-type sigv4 \
+    --aws-region us-east-1 \
+    --aws-service execute-api \
+    --request-count 500
+```
+
+### SageMaker with Custom Dataset
+
+```bash
+aiperf profile \
+    --model my-model \
+    --url https://runtime.sagemaker.us-west-2.amazonaws.com \
+    --endpoint /endpoints/my-endpoint/invocations \
+    --endpoint-type chat \
+    --auth-type sigv4 \
+    --aws-region us-west-2 \
+    --aws-service sagemaker \
+    --dataset prompts.jsonl \
+    --dataset-type single_turn
+```
+
+## Troubleshooting
+
+### "SignatureDoesNotMatch"
+
+This is the most common error. Check these in order:
+
+1. **Is `--aws-region` correct?** It must match the region in the URL.
+2. **Is `--aws-service` correct?** See the [service name table](#figuring-out-your-region-and-service-name) above. The names aren't always obvious.
+3. **Is your system clock accurate?** AWS rejects signatures that are more than 5 minutes off. Docker containers and VMs are especially prone to clock drift. Run `date -u` and compare to actual UTC.
+
+### "The security token included in the request is expired"
+
+Your temporary credentials have expired. Re-authenticate:
+
+```bash
+# For SSO
+aws sso login --profile my-profile
+
+# For assumed roles, this usually resolves itself --
+# botocore refreshes automatically if the source credentials are still valid
+```
+
+### "No AWS credentials found"
+
+AIPerf can't find any credentials. Verify with:
+
+```bash
+aws sts get-caller-identity
+```
+
+If that also fails, you need to set up credentials -- see [Setting Up Credentials](#setting-up-credentials).
+
+### "SigV4 auth requires botocore"
+
+Install the AWS extra:
+
+```bash
+uv pip install aiperf[aws]
+```
diff --git a/pyproject.toml b/pyproject.toml
@@ -70,6 +70,9 @@ aiperf = "aiperf.cli:app"
 aiperf = "aiperf.plugin:plugins.yaml"
 
 [project.optional-dependencies]
+aws = [
+  "botocore>=1.34.0",
+]
 dev = [
   "black>=25.1.0",
   "httpx>=0.27.0",

diff --git a/src/aiperf/auth/__init__.py b/src/aiperf/auth/__init__.py
@@ -0,0 +1,5 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+from aiperf.auth.base_signer import RequestSignerProtocol, SignedRequest
+
+__all__ = ["RequestSignerProtocol", "SignedRequest"]
diff --git a/src/aiperf/auth/base_signer.py b/src/aiperf/auth/base_signer.py
@@ -0,0 +1,42 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Protocol, runtime_checkable
+
+from aiperf.common.models.model_endpoint_info import ModelEndpointInfo
+from aiperf.common.protocols import AIPerfLifecycleProtocol
+
+
+@dataclass(slots=True)
+class SignedRequest:
+    """Result of signing a request.
+
+    Most signers only set headers. url and body are optionally set by signers
+    that modify the request URL (presigned URLs) or body (encryption).
+    """
+
+    headers: dict[str, str]
+    url: str | None = None
+    body: bytes | None = None
+
+
+@runtime_checkable
+class RequestSignerProtocol(AIPerfLifecycleProtocol, Protocol):
+    """Protocol for request signers that add authentication signatures.
+
+    Signers are created once per transport and called for every request.
+    The sign() method is async to support signers that need I/O for
+    credential/token refresh (OAuth2, GCP IAM, etc.).
+    """
+
+    def __init__(self, model_endpoint: ModelEndpointInfo, **kwargs) -> None: ...
+
+    async def sign(
+        self,
+        method: str,
+        url: str,
+        headers: dict[str, str],
+        body: bytes | None,
+    ) -> SignedRequest: ...