Skip to content

Sync with upstream#24

Open
lionello wants to merge 29 commits intoDefangLabs:defangfrom
aws-samples:main
Open

Sync with upstream#24
lionello wants to merge 29 commits intoDefangLabs:defangfrom
aws-samples:main

Conversation

@lionello
Copy link
Member

No description provided.

UniMa007 and others added 5 commits May 27, 2025 21:52
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#131)

---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
@lionello lionello requested a review from nullfunc July 11, 2025 14:22
heisenbergye and others added 23 commits July 21, 2025 16:44
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs

* Fix rebase issue

---------

Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
* chore: update requirements to fix vulnerability

* Update Python base image to version 3.13-slim
#180)

This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
Added handling for message and content block deltas, including safety checks for open thinking tags.

Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.
…requiring the user to cd manually (#202)

* fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually

* Add docker-compose to support running locally
…lity

Docker BuildKit (especially with docker-container driver) may create
OCI image manifests with attestations that AWS Lambda does not support.
Lambda requires Docker V2 Schema 2 format without multi-manifest index.

This fix ensures the build script generates Lambda-compatible images
regardless of the user's Docker/BuildKit configuration.

Fixes #206
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
Update dependencies to fix HIGH severity ReDoS vulnerability:
- fastapi==0.128.0
- starlette==0.49.1

CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP
Range headers that trigger quadratic-time processing in FileResponse
Range parsing, causing CPU exhaustion and DoS.

Fixes #215
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.