Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ src/aiperf/plugin/enums.pyi
# Cursor - ignore user settings
.cursor/*
!.cursor/rules/

# Superpowers
docs/superpowers/*
Comment on lines +38 to +40
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the relevance of this change to the PR.

This addition ignores the docs/superpowers/* directory, but the PR objectives focus on adding AWS SigV4 request signing support. It's unclear why this unrelated gitignore entry is included in this PR. Consider moving unrelated changes to a separate PR to keep the scope focused.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 38 - 40, The new .gitignore entry
"docs/superpowers/*" is unrelated to the SigV4 signing work; either remove that
line from this PR or revert it and place it in a separate, focused PR that
documents why docs/superpowers should be ignored. Locate the ".gitignore" change
that adds "docs/superpowers/*", revert that single entry from this branch (or
move it into a new branch/PR) and keep this PR limited to the AWS SigV4 request
signing changes only.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ Log File: /home/user/Code/aiperf/artifacts/granite4:350m-openai-chat-concurrency
- [User Interface](docs/tutorials/ui-types.md) - Dashboard, simple, or headless
- [Hugging Face TGI](docs/tutorials/huggingface-tgi.md) - Profile Hugging Face TGI models
- [OpenAI Text Endpoints](docs/tutorials/openai-text-endpoints.md) - Profile OpenAI-compatible text APIs
- [AWS SigV4 Authentication](docs/tutorials/aws-sigv4-auth.md) - Benchmark AWS endpoints (API Gateway, SageMaker)

### Load Control and Timing
- [Request Rate with Max Concurrency](docs/tutorials/request-rate-concurrency.md) - Dual request control
Expand Down
19 changes: 18 additions & 1 deletion docs/cli-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,23 @@ Transport connection reuse strategy. 'pooled' (default): connections are pooled
For video generation endpoints, download the video content after generation completes. When enabled, request latency includes the video download time. When disabled (default), only generation time is measured.
<br/>_Flag (no value required)_

#### `--auth-type` `<str>`

Request signing method for authentication. When set, the selected request_signer plugin signs every HTTP request. Replaces Bearer token auth (--api-key is ignored when --auth-type is set).
<br/>_Choices: [`sigv4`]_

#### `--aws-region` `<str>`

AWS region for SigV4 request signing (e.g., us-east-1, eu-west-1). Required when --auth-type is sigv4.

#### `--aws-service` `<str>`

AWS service name for SigV4 request signing (e.g., execute-api, sagemaker). Required when --auth-type is sigv4.

#### `--aws-profile` `<str>`

AWS profile name from ~/.aws/credentials for credential lookup. When not set, uses the default boto credential chain (env vars, config file, IAM role, IRSA, SSO).

### Input

#### `--extra-inputs` `<list>`
Expand Down Expand Up @@ -1059,7 +1076,7 @@ Explore AIPerf plugins: aiperf plugins [category] [type]
#### `--category` `<str>`

Category to explore.
<br/>_Choices: [`accuracy_benchmark`, `accuracy_grader`, `api_router`, `arrival_pattern`, `communication`, `communication_client`, `console_exporter`, `custom_dataset_loader`, `data_exporter`, `dataset_backing_store`, `dataset_client_store`, `dataset_composer`, `dataset_sampler`, `endpoint`, `gpu_telemetry_collector`, `plot`, `public_dataset_loader`, `ramp`, `record_processor`, `results_processor`, `service`, `service_manager`, `timing_strategy`, `transport`, `ui`, `url_selection_strategy`, `zmq_proxy`]_
<br/>_Choices: [`accuracy_benchmark`, `accuracy_grader`, `api_router`, `arrival_pattern`, `communication`, `communication_client`, `console_exporter`, `custom_dataset_loader`, `data_exporter`, `dataset_backing_store`, `dataset_client_store`, `dataset_composer`, `dataset_sampler`, `endpoint`, `gpu_telemetry_collector`, `plot`, `public_dataset_loader`, `ramp`, `record_processor`, `request_signer`, `results_processor`, `service`, `service_manager`, `timing_strategy`, `transport`, `ui`, `url_selection_strategy`, `zmq_proxy`]_

#### `--name` `<str>`

Expand Down
279 changes: 279 additions & 0 deletions docs/tutorials/aws-sigv4-auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
---
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
sidebar-title: AWS SigV4 Authentication
---

# Benchmarking AWS Endpoints

This guide walks you through benchmarking inference endpoints protected by AWS IAM authentication. AIPerf signs every request with your AWS credentials automatically -- you just need to tell it your AWS region and service name.

## What's Supported

SigV4 signing works with AWS endpoints that speak the OpenAI API format. Here's what works today:

| Scenario | Non-Streaming | Streaming | Notes |
|----------|:---:|:---:|-------|
| API Gateway + vLLM/TGI/NIM | Yes | Yes | Full support -- standard HTTP + SSE |
| SageMaker + vLLM/LMI container | Yes | No | Non-streaming only. SageMaker uses proprietary event framing instead of SSE. |
| Bedrock Converse / InvokeModel | No | No | Different request/response schema -- not OpenAI-compatible |

## Before You Start

1. Install the AWS extra (this pulls in `botocore` for credential handling):

```bash
uv pip install aiperf[aws]
```

2. Make sure your AWS credentials are working:

```bash
aws sts get-caller-identity
```

If that prints your account and role info, you're good to go. If not, see [Setting Up Credentials](#setting-up-credentials).

## Quick Start

The key flags are `--auth-type sigv4`, `--aws-region`, and `--aws-service`. Add these to any `aiperf profile` command and AIPerf will sign every request automatically.

### API Gateway with IAM Auth

Your API Gateway fronts an OpenAI-compatible server and has IAM authorization enabled. Both streaming and non-streaming work:

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--streaming \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-count 100
```

If your API Gateway maps a custom path to the backend, use `--endpoint` to set it:

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com \
--endpoint /prod/inference/v1/chat/completions \
--endpoint-type chat \
--streaming \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-count 100
```

### SageMaker with vLLM or LMI (Non-Streaming)

SageMaker endpoints running vLLM or DJL LMI containers accept OpenAI-format request bodies through the `/invocations` path. The response body is passed through unchanged, so non-streaming works. Use `--endpoint` to set the SageMaker invocation path:

```bash
aiperf profile \
--model my-model \
--url https://runtime.sagemaker.us-east-1.amazonaws.com \
--endpoint /endpoints/my-endpoint/invocations \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service sagemaker \
--request-count 100
```

Streaming is not supported for SageMaker endpoints because SageMaker uses a proprietary event stream format instead of SSE. Do not pass `--streaming` with SageMaker.

## Figuring Out Your Region and Service Name

The `--aws-region` should match the region in your endpoint URL:

```
https://abc123.execute-api.us-east-1.amazonaws.com/...
^^^^^^^^^
this is your --aws-region
```
Comment on lines +94 to +98
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced code block.

The block on Line 94 is missing a language specifier (markdownlint MD040).

Proposed fix
-```
+```text
 https://abc123.execute-api.us-east-1.amazonaws.com/...
                            ^^^^^^^^^
                            this is your --aws-region
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 94-94: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/tutorials/aws-sigv4-auth.md` around lines 94 - 98, The fenced code block
that contains the URL "https://abc123.execute-api.us-east-1.amazonaws.com/..."
is missing a language tag which triggers markdownlint MD040; update that fence
(the triple-backtick that opens the block) to include a language specifier such
as "text" (e.g., change ``` to ```text) so the snippet is properly labeled and
lint-clean.


The `--aws-service` depends on which AWS service handles your traffic:

| If your traffic goes through... | Use `--aws-service` |
|--------------------------------|---------------------|
| API Gateway | `execute-api` |
| SageMaker Runtime | `sagemaker` |

A common gotcha: the service name isn't always what you'd guess. For example, it's `sagemaker` not `sagemaker-runtime`. If you get a "SignatureDoesNotMatch" error, the service name is the first thing to double-check.

## Setting Up Credentials

If `aws sts get-caller-identity` already works, you can skip this section -- AIPerf will pick up the same credentials automatically.

### Environment Variables (simplest)

Good for quick local testing:

```bash
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="wJal..."
export AWS_SESSION_TOKEN="FwoG..." # only if using temporary credentials

aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-count 100
```

### Named Profiles (multiple accounts)

If you work with more than one AWS account, you probably already have profiles set up in `~/.aws/credentials`. Point AIPerf at the right one with `--aws-profile`:

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-west-2.amazonaws.com/prod/v1 \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-west-2 \
--aws-service execute-api \
--aws-profile staging \
--request-count 100
```

Without `--aws-profile`, AIPerf uses whichever credentials the AWS CLI would use by default (environment variables first, then `[default]` profile, then IAM roles).

### SSO

If your team uses AWS IAM Identity Center (SSO), log in first, then pass the profile:

```bash
aws sso login --profile my-sso-profile

aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--aws-profile my-sso-profile \
--request-count 100
```

### Kubernetes (EKS)

On EKS, credentials are typically injected into your pod automatically via IRSA or Pod Identity. You don't need `--aws-profile` -- just make sure your pod's service account has the right IAM role attached:

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-count 1000
```

One thing to watch: if your pod has `AWS_ACCESS_KEY_ID` set as an environment variable (e.g., from a Kubernetes Secret), that takes priority over IRSA/Pod Identity. If you're hitting the wrong account, check for stale env vars.

## Long-Running Benchmarks

AIPerf refreshes AWS credentials automatically before each request. This means temporary credentials (from SSO, assumed roles, or IRSA) won't expire mid-benchmark. If you're running a long benchmark with thousands of requests, you don't need to do anything special.

The one exception: if your SSO session itself expires (they typically last 8-12 hours), you'll need to re-run `aws sso login` and restart the benchmark.

## Examples

### High-Throughput API Gateway with Warmup

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--streaming \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-rate 50 \
--request-count 1000 \
--warmup-request-count 20
```

### Multiple API Gateway Endpoints

Distribute load across two endpoints in the same region:

```bash
aiperf profile \
--model my-model \
--url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \
--url https://def456.execute-api.us-east-1.amazonaws.com/prod/v1 \
--endpoint-type chat \
--streaming \
--auth-type sigv4 \
--aws-region us-east-1 \
--aws-service execute-api \
--request-count 500
```

### SageMaker with Custom Dataset

```bash
aiperf profile \
--model my-model \
--url https://runtime.sagemaker.us-west-2.amazonaws.com \
--endpoint /endpoints/my-endpoint/invocations \
--endpoint-type chat \
--auth-type sigv4 \
--aws-region us-west-2 \
--aws-service sagemaker \
--dataset prompts.jsonl \
--dataset-type single_turn
```

## Troubleshooting

### "SignatureDoesNotMatch"

This is the most common error. Check these in order:

1. **Is `--aws-region` correct?** It must match the region in the URL.
2. **Is `--aws-service` correct?** See the [service name table](#figuring-out-your-region-and-service-name) above. The names aren't always obvious.
3. **Is your system clock accurate?** AWS rejects signatures that are more than 5 minutes off. Docker containers and VMs are especially prone to clock drift. Run `date -u` and compare to actual UTC.

### "The security token included in the request is expired"

Your temporary credentials have expired. Re-authenticate:

```bash
# For SSO
aws sso login --profile my-profile

# For assumed roles, this usually resolves itself --
# botocore refreshes automatically if the source credentials are still valid
```

### "No AWS credentials found"

AIPerf can't find any credentials. Verify with:

```bash
aws sts get-caller-identity
```

If that also fails, you need to set up credentials -- see [Setting Up Credentials](#setting-up-credentials).

### "SigV4 auth requires botocore"

Install the AWS extra:

```bash
uv pip install aiperf[aws]
```
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ aiperf = "aiperf.cli:app"
aiperf = "aiperf.plugin:plugins.yaml"

[project.optional-dependencies]
aws = [
"botocore>=1.34.0",
]
dev = [
"black>=25.1.0",
"httpx>=0.27.0",
Expand Down
5 changes: 5 additions & 0 deletions src/aiperf/auth/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
from aiperf.auth.base_signer import RequestSignerProtocol, SignedRequest

__all__ = ["RequestSignerProtocol", "SignedRequest"]
42 changes: 42 additions & 0 deletions src/aiperf/auth/base_signer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
from __future__ import annotations

from dataclasses import dataclass
from typing import Protocol, runtime_checkable

from aiperf.common.models.model_endpoint_info import ModelEndpointInfo
from aiperf.common.protocols import AIPerfLifecycleProtocol


@dataclass(slots=True)
class SignedRequest:
"""Result of signing a request.

Most signers only set headers. url and body are optionally set by signers
that modify the request URL (presigned URLs) or body (encryption).
"""

headers: dict[str, str]
url: str | None = None
body: bytes | None = None


@runtime_checkable
class RequestSignerProtocol(AIPerfLifecycleProtocol, Protocol):
"""Protocol for request signers that add authentication signatures.

Signers are created once per transport and called for every request.
The sign() method is async to support signers that need I/O for
credential/token refresh (OAuth2, GCP IAM, etc.).
"""

def __init__(self, model_endpoint: ModelEndpointInfo, **kwargs) -> None: ...

async def sign(
self,
method: str,
url: str,
headers: dict[str, str],
body: bytes | None,
) -> SignedRequest: ...
Loading
Loading