-
Couldn't load subscription status.
- Fork 1.2k
feat: add OpenAI-compatible Bedrock provider #3748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add OpenAI-compatible Bedrock provider #3748
Conversation
59d4cfa to
e4d71e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this one as a reference #3707
14919c1 to
3a9af0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please report the result of the tests/integration/inference/test_openai_completion.py and other openai related tested.
Also why has the uv.lock changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nothing from models.py is used, please remove it
- is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
- is the /v1/compltions endpoint available? if not...
- great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers
3a9af0c to
56bff11
Compare
56bff11 to
7024e56
Compare
7024e56 to
55aaa6e
Compare
55aaa6e to
dd2a1f6
Compare
|
@leseb I addressed missing comments. I had actually thought I addressed these before. Sorry, with deeper look I found logger category comment to be addressed. Thanks. |
no worries, please rebase. |
Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Changes: - Add BedrockInferenceAdapter using OpenAIMixin base - Configure region-specific endpoint URLs - Add NotImplementedError stubs for unsupported endpoints - Implement authentication error handling with helpful messages - Remove unused models.py file - Add comprehensive unit tests (12 total) - Add provider registry configuration
dd2a1f6 to
296d7a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nothing from models.py is used, please remove it
- is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
- is the /v1/compltions endpoint available? if not...
- great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers
new -
-
BedrockConfigis aRemoteInferenceProviderConfig, useauth_credentialinstead of a newapi_keyfield, see https://github.com/llamastack/llama-stack/blob/main/src/llama_stack/providers/utils/inference/model_registry.py#L22 - you don't need to override
get_api_key - instead of overriding
register_model, useasync def check_model_availability(self, model: str) -> bool: return True
| # Convert foundation model ID to inference profile ID | ||
| region_name = self.client.meta.region_name | ||
| inference_profile_id = _to_inference_profile_id(bedrock_model, region_name) | ||
| async def register_model(self, model: Model) -> Model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use async def check_model_availability(self, model: str) -> bool: return True
|
|
||
|
|
||
| class BedrockConfig(RemoteInferenceProviderConfig): | ||
| api_key: str | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use auth_credential instead of a new api_key field, see https://github.com/llamastack/llama-stack/blob/main/src/llama_stack/providers/utils/inference/model_registry.py#L22
|
|
||
| sampling_params = request.sampling_params | ||
| options = get_sampling_strategy_options(sampling_params) | ||
| def get_api_key(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to override get_api_key
Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock.
Closes: #3410
What does this PR do?
Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API.
The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers.
Test Plan
Tested the following scenarios:
Bedrock OpenAI-Compatible Provider - Test Results
Model:
bedrock-inference/openai.gpt-oss-20b-1:0Test 1: Model Listing
Request:
Response:
Test 2: Non-Streaming Completion
Request:
Response:
Test 3: Streaming Completion
Request:
Response:
Test 4: Error Handling - Invalid Model
Request:
Response:
Test 5: Multi-Turn Conversation
Request 1:
Response 1:
Request 2 (with history):
Response 2:
Context retained across turns
Test 6: System Messages
Request:
Response:
Test 7: Tool Calling
Request:
Response:
Test 8: Sampling Parameters
Request:
Response:
Test 9: Authentication Error Handling
Subtest A: Invalid API Key
Request:
Response:
Subtest B: Empty API Key (Fallback to Config)
Request:
Response:
Fell back to config key
Subtest C: Malformed Token
Request:
Response: