Skip to content

Commit f49eba6

Browse files
fix the missing caching params for claude 3.7 on bedrock
1 parent a75c35f commit f49eba6

File tree

7 files changed

+631
-587
lines changed

7 files changed

+631
-587
lines changed

cline_docs/bedrock-cache-strategy-documentation.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -554,10 +554,11 @@ const config = {
554554
1. The algorithm detects that all cache points are used and new messages have been added.
555555
2. It calculates the token count of the new messages (400 tokens).
556556
3. It analyzes the token distribution between existing cache points and finds the smallest gap (260 tokens).
557-
4. It compares the token count of new messages (400) with the smallest gap (260).
558-
5. Since the new messages have more tokens than the smallest gap (400 > 260), it decides to combine cache points.
559-
6. It identifies that the cache point at index 8 has the smallest token coverage (260 tokens).
560-
7. It removes this cache point and places a new one after the new user message.
557+
4. It calculates the required token threshold by applying a 20% increase to the smallest gap (260 \* 1.2 = 312).
558+
5. It compares the token count of new messages (400) with this threshold (312).
559+
6. Since the new messages have significantly more tokens than the threshold (400 > 312), it decides to combine cache points.
560+
7. It identifies that the cache point at index 8 has the smallest token coverage (260 tokens).
561+
8. It removes this cache point and places a new one after the new user message.
561562

562563
**Output Cache Point Placements with Reallocation:**
563564

@@ -607,14 +608,14 @@ const config = {
607608

608609
### Key Observations
609610

610-
1. **Simplified Placement Logic**: The algorithm now simply finds the last user message in each range, rather than using complex token midpoint calculations. This makes the code more maintainable while still providing effective cache point placement.
611+
1. **Simple Initial Placement Logic**: The last user message in the range that meets the minimum token threshold is set as a cachePoint.
611612

612613
2. **User Message Boundary Requirement**: Cache points are placed exclusively after user messages, not after assistant messages. This ensures cache points are placed at natural conversation boundaries where the user has provided input.
613614

614615
3. **Token Threshold Enforcement**: Each segment between cache points must meet the minimum token threshold (100 tokens in our examples) to be considered for caching. This is enforced by a guard clause that checks if the total tokens covered by a placement meets the minimum threshold.
615616

616617
4. **Adaptive Placement for Growing Conversations**: As the conversation grows, the strategy adapts by preserving previous cache points when possible and only reallocating them when beneficial.
617618

618-
5. **Token Comparison Optimization**: When all cache points are used and new messages are added, the algorithm compares the token count of new messages with the smallest combined gap between existing cache points. Cache points are only combined if the new messages have more tokens than the smallest gap, ensuring that reallocation is only done when it results in a net positive effect on caching efficiency.
619+
5. **Token Comparison Optimization with Required Increase**: When all cache points are used and new messages are added, the algorithm compares the token count of new messages with the smallest combined token count of contiguous existing cache points, applying a required percentage increase (20%) to ensure reallocation is worth it. Cache points are only combined if the new messages have significantly more tokens than this threshold, ensuring that reallocation is only done when it results in a substantial net positive effect on caching efficiency.
619620

620621
This adaptive approach ensures that as conversations grow, the caching strategy continues to optimize token usage and response times by strategically placing cache points at the most effective positions, while avoiding inefficient reallocations that could result in a net negative effect on caching performance.

src/api/providers/__tests__/bedrock.test.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -507,9 +507,7 @@ describe("AwsBedrockHandler", () => {
507507
send: mockSend,
508508
} as unknown as BedrockRuntimeClient
509509

510-
await expect(handler.completePrompt("Test prompt")).rejects.toThrow(
511-
"Bedrock completion error: AWS Bedrock error",
512-
)
510+
await expect(handler.completePrompt("Test prompt")).rejects.toThrow(/^Bedrock completion error:/)
513511
})
514512

515513
it("should handle invalid response format", async () => {

0 commit comments

Comments
 (0)