-
Notifications
You must be signed in to change notification settings - Fork 47
feat: add AWS SigV4 request signing support #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ajcasagrande
wants to merge
1
commit into
main
Choose a base branch
from
ajc/sigv4-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,279 @@ | ||
| --- | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| sidebar-title: AWS SigV4 Authentication | ||
| --- | ||
|
|
||
| # Benchmarking AWS Endpoints | ||
|
|
||
| This guide walks you through benchmarking inference endpoints protected by AWS IAM authentication. AIPerf signs every request with your AWS credentials automatically -- you just need to tell it your AWS region and service name. | ||
|
|
||
| ## What's Supported | ||
|
|
||
| SigV4 signing works with AWS endpoints that speak the OpenAI API format. Here's what works today: | ||
|
|
||
| | Scenario | Non-Streaming | Streaming | Notes | | ||
| |----------|:---:|:---:|-------| | ||
| | API Gateway + vLLM/TGI/NIM | Yes | Yes | Full support -- standard HTTP + SSE | | ||
| | SageMaker + vLLM/LMI container | Yes | No | Non-streaming only. SageMaker uses proprietary event framing instead of SSE. | | ||
| | Bedrock Converse / InvokeModel | No | No | Different request/response schema -- not OpenAI-compatible | | ||
|
|
||
| ## Before You Start | ||
|
|
||
| 1. Install the AWS extra (this pulls in `botocore` for credential handling): | ||
|
|
||
| ```bash | ||
| uv pip install aiperf[aws] | ||
| ``` | ||
|
|
||
| 2. Make sure your AWS credentials are working: | ||
|
|
||
| ```bash | ||
| aws sts get-caller-identity | ||
| ``` | ||
|
|
||
| If that prints your account and role info, you're good to go. If not, see [Setting Up Credentials](#setting-up-credentials). | ||
|
|
||
| ## Quick Start | ||
|
|
||
| The key flags are `--auth-type sigv4`, `--aws-region`, and `--aws-service`. Add these to any `aiperf profile` command and AIPerf will sign every request automatically. | ||
|
|
||
| ### API Gateway with IAM Auth | ||
|
|
||
| Your API Gateway fronts an OpenAI-compatible server and has IAM authorization enabled. Both streaming and non-streaming work: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --streaming \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| If your API Gateway maps a custom path to the backend, use `--endpoint` to set it: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com \ | ||
| --endpoint /prod/inference/v1/chat/completions \ | ||
| --endpoint-type chat \ | ||
| --streaming \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| ### SageMaker with vLLM or LMI (Non-Streaming) | ||
|
|
||
| SageMaker endpoints running vLLM or DJL LMI containers accept OpenAI-format request bodies through the `/invocations` path. The response body is passed through unchanged, so non-streaming works. Use `--endpoint` to set the SageMaker invocation path: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://runtime.sagemaker.us-east-1.amazonaws.com \ | ||
| --endpoint /endpoints/my-endpoint/invocations \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service sagemaker \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| Streaming is not supported for SageMaker endpoints because SageMaker uses a proprietary event stream format instead of SSE. Do not pass `--streaming` with SageMaker. | ||
|
|
||
| ## Figuring Out Your Region and Service Name | ||
|
|
||
| The `--aws-region` should match the region in your endpoint URL: | ||
|
|
||
| ``` | ||
| https://abc123.execute-api.us-east-1.amazonaws.com/... | ||
| ^^^^^^^^^ | ||
| this is your --aws-region | ||
| ``` | ||
|
Comment on lines
+94
to
+98
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a language to the fenced code block. The block on Line 94 is missing a language specifier (markdownlint MD040). Proposed fix-```
+```text
https://abc123.execute-api.us-east-1.amazonaws.com/...
^^^^^^^^^
this is your --aws-region🧰 Tools🪛 markdownlint-cli2 (0.21.0)[warning] 94-94: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||
|
|
||
| The `--aws-service` depends on which AWS service handles your traffic: | ||
|
|
||
| | If your traffic goes through... | Use `--aws-service` | | ||
| |--------------------------------|---------------------| | ||
| | API Gateway | `execute-api` | | ||
| | SageMaker Runtime | `sagemaker` | | ||
|
|
||
| A common gotcha: the service name isn't always what you'd guess. For example, it's `sagemaker` not `sagemaker-runtime`. If you get a "SignatureDoesNotMatch" error, the service name is the first thing to double-check. | ||
|
|
||
| ## Setting Up Credentials | ||
|
|
||
| If `aws sts get-caller-identity` already works, you can skip this section -- AIPerf will pick up the same credentials automatically. | ||
|
|
||
| ### Environment Variables (simplest) | ||
|
|
||
| Good for quick local testing: | ||
|
|
||
| ```bash | ||
| export AWS_ACCESS_KEY_ID="AKIA..." | ||
| export AWS_SECRET_ACCESS_KEY="wJal..." | ||
| export AWS_SESSION_TOKEN="FwoG..." # only if using temporary credentials | ||
|
|
||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| ### Named Profiles (multiple accounts) | ||
|
|
||
| If you work with more than one AWS account, you probably already have profiles set up in `~/.aws/credentials`. Point AIPerf at the right one with `--aws-profile`: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-west-2.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-west-2 \ | ||
| --aws-service execute-api \ | ||
| --aws-profile staging \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| Without `--aws-profile`, AIPerf uses whichever credentials the AWS CLI would use by default (environment variables first, then `[default]` profile, then IAM roles). | ||
|
|
||
| ### SSO | ||
|
|
||
| If your team uses AWS IAM Identity Center (SSO), log in first, then pass the profile: | ||
|
|
||
| ```bash | ||
| aws sso login --profile my-sso-profile | ||
|
|
||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --aws-profile my-sso-profile \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| ### Kubernetes (EKS) | ||
|
|
||
| On EKS, credentials are typically injected into your pod automatically via IRSA or Pod Identity. You don't need `--aws-profile` -- just make sure your pod's service account has the right IAM role attached: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-count 1000 | ||
| ``` | ||
|
|
||
| One thing to watch: if your pod has `AWS_ACCESS_KEY_ID` set as an environment variable (e.g., from a Kubernetes Secret), that takes priority over IRSA/Pod Identity. If you're hitting the wrong account, check for stale env vars. | ||
|
|
||
| ## Long-Running Benchmarks | ||
|
|
||
| AIPerf refreshes AWS credentials automatically before each request. This means temporary credentials (from SSO, assumed roles, or IRSA) won't expire mid-benchmark. If you're running a long benchmark with thousands of requests, you don't need to do anything special. | ||
|
|
||
| The one exception: if your SSO session itself expires (they typically last 8-12 hours), you'll need to re-run `aws sso login` and restart the benchmark. | ||
|
|
||
| ## Examples | ||
|
|
||
| ### High-Throughput API Gateway with Warmup | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --streaming \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-rate 50 \ | ||
| --request-count 1000 \ | ||
| --warmup-request-count 20 | ||
| ``` | ||
|
|
||
| ### Multiple API Gateway Endpoints | ||
|
|
||
| Distribute load across two endpoints in the same region: | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://abc123.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --url https://def456.execute-api.us-east-1.amazonaws.com/prod/v1 \ | ||
| --endpoint-type chat \ | ||
| --streaming \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-east-1 \ | ||
| --aws-service execute-api \ | ||
| --request-count 500 | ||
| ``` | ||
|
|
||
| ### SageMaker with Custom Dataset | ||
|
|
||
| ```bash | ||
| aiperf profile \ | ||
| --model my-model \ | ||
| --url https://runtime.sagemaker.us-west-2.amazonaws.com \ | ||
| --endpoint /endpoints/my-endpoint/invocations \ | ||
| --endpoint-type chat \ | ||
| --auth-type sigv4 \ | ||
| --aws-region us-west-2 \ | ||
| --aws-service sagemaker \ | ||
| --dataset prompts.jsonl \ | ||
| --dataset-type single_turn | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### "SignatureDoesNotMatch" | ||
|
|
||
| This is the most common error. Check these in order: | ||
|
|
||
| 1. **Is `--aws-region` correct?** It must match the region in the URL. | ||
| 2. **Is `--aws-service` correct?** See the [service name table](#figuring-out-your-region-and-service-name) above. The names aren't always obvious. | ||
| 3. **Is your system clock accurate?** AWS rejects signatures that are more than 5 minutes off. Docker containers and VMs are especially prone to clock drift. Run `date -u` and compare to actual UTC. | ||
|
|
||
| ### "The security token included in the request is expired" | ||
|
|
||
| Your temporary credentials have expired. Re-authenticate: | ||
|
|
||
| ```bash | ||
| # For SSO | ||
| aws sso login --profile my-profile | ||
|
|
||
| # For assumed roles, this usually resolves itself -- | ||
| # botocore refreshes automatically if the source credentials are still valid | ||
| ``` | ||
|
|
||
| ### "No AWS credentials found" | ||
|
|
||
| AIPerf can't find any credentials. Verify with: | ||
|
|
||
| ```bash | ||
| aws sts get-caller-identity | ||
| ``` | ||
|
|
||
| If that also fails, you need to set up credentials -- see [Setting Up Credentials](#setting-up-credentials). | ||
|
|
||
| ### "SigV4 auth requires botocore" | ||
|
|
||
| Install the AWS extra: | ||
|
|
||
| ```bash | ||
| uv pip install aiperf[aws] | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| from aiperf.auth.base_signer import RequestSignerProtocol, SignedRequest | ||
|
|
||
| __all__ = ["RequestSignerProtocol", "SignedRequest"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| from __future__ import annotations | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Protocol, runtime_checkable | ||
|
|
||
| from aiperf.common.models.model_endpoint_info import ModelEndpointInfo | ||
| from aiperf.common.protocols import AIPerfLifecycleProtocol | ||
|
|
||
|
|
||
| @dataclass(slots=True) | ||
| class SignedRequest: | ||
| """Result of signing a request. | ||
|
|
||
| Most signers only set headers. url and body are optionally set by signers | ||
| that modify the request URL (presigned URLs) or body (encryption). | ||
| """ | ||
|
|
||
| headers: dict[str, str] | ||
| url: str | None = None | ||
| body: bytes | None = None | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class RequestSignerProtocol(AIPerfLifecycleProtocol, Protocol): | ||
| """Protocol for request signers that add authentication signatures. | ||
|
|
||
| Signers are created once per transport and called for every request. | ||
| The sign() method is async to support signers that need I/O for | ||
| credential/token refresh (OAuth2, GCP IAM, etc.). | ||
| """ | ||
|
|
||
| def __init__(self, model_endpoint: ModelEndpointInfo, **kwargs) -> None: ... | ||
|
|
||
| async def sign( | ||
| self, | ||
| method: str, | ||
| url: str, | ||
| headers: dict[str, str], | ||
| body: bytes | None, | ||
| ) -> SignedRequest: ... |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the relevance of this change to the PR.
This addition ignores the
docs/superpowers/*directory, but the PR objectives focus on adding AWS SigV4 request signing support. It's unclear why this unrelated gitignore entry is included in this PR. Consider moving unrelated changes to a separate PR to keep the scope focused.🤖 Prompt for AI Agents