fix:(embeddings-bedrock) correct extraction of provider from model_name #20295
+85
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request fixes a bug in the provider extraction logic for Bedrock
embedding models that caused failures when using regionally-scoped model names
(e.g.,
"eu.cohere.embed-v4:0"). The issue was that the code was naively splittingmodel names on
.and taking the first part, which would return the region prefixinstead of the actual provider name, resulting in "Provider not supported" errors.
Bug fix:
(
provider.model) and regionally-scoped 3-part model names(
region.provider.model)_get_provider()method with proper logic to extract theprovider name from different model name formats
self.model_name.split(".")[0]with calls to the new_get_provider()methodProvider extraction improvements:
_get_provider()method toBedrockEmbeddingthat correctlyextracts the provider name from model names:
provider.model(e.g.,"amazon.titan-embed-text-v1") →returns first part (
"amazon")region.provider.model(e.g.,"eu.cohere.embed-v4:0") →returns middle part (
"cohere")ValueErrorfor unexpected formats_get_embedding,_get_text_embeddings, and_aget_embeddingto use the new_get_provider()method for consistency and correctness
their handling
Testing enhancements:
test_get_provider_two_part_format()to verify standard model names (e.g.,"amazon.titan-embed-text-v1","cohere.embed-english-v3")test_get_provider_three_part_format()to verify regionally-scoped modelnames (e.g.,
"us.amazon.titan-embed-text-v1","eu.cohere.embed-english-v3","global.amazon.titan-embed-text-v2")test_get_provider_invalid_format()to verify proper error handling forinvalid model name formats
Code style and import order:
base.pyfor improved readability and consistencyformatting)
Root Cause
The original code used
self.model_name.split(".")[0]to extract the provider,which worked for standard model names like
"amazon.titan-embed-text-v1"butfailed for regionally-scoped names like
"eu.cohere.embed-v4:0"because it wouldreturn
"eu"instead of"cohere", causing the error:{"output": "Error processing request: Provider not supported", "metadata":
{"error": "Provider not supported"}}
Solution
The new
_get_provider()method intelligently handles both formats by counting thenumber of dot-separated parts and extracting the correct segment based on the
format.
Fixes # (issue)
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide adetailed README.md for my new integration or package?
Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating?(Except for the
llama-index-corepackage)Type of Change
Please delete options that are not relevant.
not work as expected)
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of
impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods