feat: add OpenAI-compatible Bedrock provider #3748

skamenan7 · 2025-10-08T22:10:21Z

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock.

Closes: #3410

What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers.

Test Plan

Tested the following scenarios:

Non-streaming completion - basic request/response flow
Streaming completion - SSE streaming with chunked responses
Multi-turn conversations - context retention across turns
Tool calling - function calling with proper tool_calls format

Bedrock OpenAI-Compatible Provider - Test Results

Model: bedrock-inference/openai.gpt-oss-20b-1:0

Test 1: Model Listing

Request:

GET /v1/models HTTP/1.1

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}

Test 2: Non-Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}

Test 3: Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}

Response:

HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"

Test 4: Error Handling - Invalid Model

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

Response:

HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}

Test 5: Multi-Turn Conversation

Request 1:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}

Response 1:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}

Request 2 (with history):

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}

Response 2:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}

Context retained across turns

Test 6: System Messages

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}

Test 7: Tool Calling

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}

Test 8: Sampling Parameters

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}

Test 9: Authentication Error Handling

Subtest A: Invalid API Key

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

Subtest B: Empty API Key (Fallback to Config)

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}

Fell back to config key

Subtest C: Malformed Token

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

leseb

Use this one as a reference #3707

llama_stack/providers/remote/inference/bedrock/bedrock.py

leseb

Please report the result of the tests/integration/inference/test_openai_completion.py and other openai related tested.

Also why has the uv.lock changed?

mattf

nothing from models.py is used, please remove it
is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
is the /v1/compltions endpoint available? if not...
great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers

uv.lock

llama_stack/providers/remote/inference/bedrock/bedrock.py

skamenan7 · 2025-10-27T18:28:22Z

@leseb I addressed missing comments. I had actually thought I addressed these before. Sorry, with deeper look I found logger category comment to be addressed. Thanks.

leseb · 2025-10-28T08:18:28Z

@leseb I addressed missing comments. I had actually thought I addressed these before. Sorry, with deeper look I found logger category comment to be addressed. Thanks.

no worries, please rebase.

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Changes: - Add BedrockInferenceAdapter using OpenAIMixin base - Configure region-specific endpoint URLs - Add NotImplementedError stubs for unsupported endpoints - Implement authentication error handling with helpful messages - Remove unused models.py file - Add comprehensive unit tests (12 total) - Add provider registry configuration

… provider

mattf

nothing from models.py is used, please remove it
is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
is the /v1/compltions endpoint available? if not...
great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers

new -

BedrockConfig is a RemoteInferenceProviderConfig, use auth_credential instead of a new api_key field, see https://github.com/llamastack/llama-stack/blob/main/src/llama_stack/providers/utils/inference/model_registry.py#L22
you don't need to override get_api_key
instead of overriding register_model, use async def check_model_availability(self, model: str) -> bool: return True

mattf · 2025-10-28T14:17:57Z

src/llama_stack/providers/remote/inference/bedrock/bedrock.py

-        # Convert foundation model ID to inference profile ID
-        region_name = self.client.meta.region_name
-        inference_profile_id = _to_inference_profile_id(bedrock_model, region_name)
+    async def register_model(self, model: Model) -> Model:


use async def check_model_availability(self, model: str) -> bool: return True

mattf · 2025-10-28T14:18:17Z

src/llama_stack/providers/remote/inference/bedrock/config.py

+
+
+class BedrockConfig(RemoteInferenceProviderConfig):
+    api_key: str | None = Field(


use auth_credential instead of a new api_key field, see https://github.com/llamastack/llama-stack/blob/main/src/llama_stack/providers/utils/inference/model_registry.py#L22

mattf · 2025-10-28T14:18:44Z

src/llama_stack/providers/remote/inference/bedrock/bedrock.py


-        sampling_params = request.sampling_params
-        options = get_sampling_strategy_options(sampling_params)
+    def get_api_key(self) -> str:


you don't need to override get_api_key

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 2 times, most recently from 59d4cfa to e4d71e7 Compare October 9, 2025 12:34

leseb requested changes Oct 9, 2025

View reviewed changes

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 5 times, most recently from 14919c1 to 3a9af0c Compare October 9, 2025 17:44

skamenan7 marked this pull request as ready for review October 9, 2025 21:09

skamenan7 requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 9, 2025 21:09

skamenan7 requested a review from leseb October 9, 2025 21:09

leseb requested changes Oct 10, 2025

View reviewed changes

mattf requested changes Oct 10, 2025

View reviewed changes

uv.lock Show resolved Hide resolved

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

This was referenced Oct 10, 2025

Cleanup uses of OpenAIMixin, simplify inference adapters #3517

Closed

Standardize Inference Providers to Use OpenAIMixin #3387

Closed

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 3a9af0c to 56bff11 Compare October 10, 2025 14:04

skamenan7 requested review from leseb and mattf October 10, 2025 14:05

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 56bff11 to 7024e56 Compare October 13, 2025 20:19

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 7024e56 to 55aaa6e Compare October 13, 2025 21:04

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 55aaa6e to dd2a1f6 Compare October 27, 2025 18:26

skamenan7 added 2 commits October 28, 2025 09:14

fix: add logger with category and update telemetry import for bedrock…

296d7a9

… provider

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from dd2a1f6 to 296d7a9 Compare October 28, 2025 13:15

mattf requested changes Oct 28, 2025

View reviewed changes

skamenan7 added 2 commits October 28, 2025 13:18

refactor: use auth_credential and simplify model availability check

73c18c8

test: update unit tests for auth_credential refactoring

e4a5321

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add OpenAI-compatible Bedrock provider #3748

feat: add OpenAI-compatible Bedrock provider #3748

Uh oh!

skamenan7 commented Oct 8, 2025 •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

Uh oh!

Uh oh!

leseb left a comment

Uh oh!

mattf left a comment

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Oct 27, 2025 •

edited

Loading

Uh oh!

leseb commented Oct 28, 2025

Uh oh!

mattf left a comment

Uh oh!

mattf Oct 28, 2025

Uh oh!

mattf Oct 28, 2025

Uh oh!

mattf Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class BedrockConfig(RemoteInferenceProviderConfig):
		api_key: str \| None = Field(

Uh oh!

feat: add OpenAI-compatible Bedrock provider #3748

Are you sure you want to change the base?

feat: add OpenAI-compatible Bedrock provider #3748

Uh oh!

Conversation

skamenan7 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Bedrock OpenAI-Compatible Provider - Test Results

Test 1: Model Listing

Test 2: Non-Streaming Completion

Test 3: Streaming Completion

Test 4: Error Handling - Invalid Model

Test 5: Multi-Turn Conversation

Test 6: System Messages

Test 7: Tool Calling

Test 8: Sampling Parameters

Test 9: Authentication Error Handling

Subtest A: Invalid API Key

Subtest B: Empty API Key (Fallback to Config)

Subtest C: Malformed Token

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leseb commented Oct 28, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

mattf Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

mattf Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

mattf Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skamenan7 commented Oct 8, 2025 •

edited

Loading

skamenan7 commented Oct 27, 2025 •

edited

Loading