Optimized Latency Inference Models with AWS #28544

johnpiscani · 2024-12-05T18:36:28Z

johnpiscani
Dec 5, 2024

Checked other resources

I added a very descriptive title to this question.
I used the GitHub search to find a similar question and didn't find it.
I searched the LangChain documentation with the integrated search.

Commit to Help

I commit to help with one of those options 👆

Example Code

# langchain code where unsure where to put performance config
model = ChatBedrock(
        region_name='us-east-2',
        model_id='us.anthropic.claude-3-5-haiku-20241022-v1:0',
        provider='anthropic',
)
response = model.invoke(input=input_prompt)

#using aws boto3 to invoke the model
client = boto3.client("bedrock-runtime", region_name="us-east-2")
payload = {
        "messages": [
            {"content": input_prompt, "role": "user"}  # Fix role
        ],
        "max_tokens": 4000,  # Corrected key for max tokens
        "anthropic_version": "bedrock-2023-05-31"  # Required key
    }
response = client.invoke_model(
            modelId="us.anthropic.claude-3-5-haiku-20241022-v1:0", 
            accept="application/json",
            contentType="application/json",
            body=json.dumps(payload),
            # performanceConfigLatency="standard"  # Correct parameter for no latency optimization
            performanceConfigLatency="optimized"  # Correct parameter for latency optimization
        )

Description

AWS Just released latency optimized inference models for Haiku 3.5 (https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html)
I want to be able to invoke this inference model from within langchain, but can not figure out how to correctly put this parameter when using langchain. I have provide both the code that is working using boto3 and the code I can not get to work using langchain. Please help me in getting this performance latency code within langchain code.

Thank you!

P.S. this feature was just recently released as part of AWS reInvent

System Info

Using most to do langchain and langchain_aws

johnpiscani · 2024-12-05T18:39:54Z

johnpiscani
Dec 5, 2024
Author

to do

recent

0 replies

carl-krikorian · 2025-01-02T10:22:32Z

carl-krikorian
Jan 2, 2025

Any updates on this? It would be very nice

2 replies

johnpiscani Jan 13, 2025
Author

still heard nothing

carl-krikorian Jan 13, 2025

You should know they updated it on AWS langchain there is an MR here:
langchain-ai/langchain-aws#315

It's not merged yet but I inherited the class and created a CustomChatBedrockConverse based on that MR. It works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized Latency Inference Models with AWS #28544

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Optimized Latency Inference Models with AWS #28544

Uh oh!

Uh oh!

johnpiscani Dec 5, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 2 comments · 2 replies

Uh oh!

johnpiscani Dec 5, 2024 Author

Uh oh!

carl-krikorian Jan 2, 2025

Uh oh!

johnpiscani Jan 13, 2025 Author

Uh oh!

carl-krikorian Jan 13, 2025

johnpiscani
Dec 5, 2024

Replies: 2 comments 2 replies

johnpiscani
Dec 5, 2024
Author

carl-krikorian
Jan 2, 2025

johnpiscani Jan 13, 2025
Author