Sync with upstream by lionello · Pull Request #24 · DefangLabs/openai-access-gateway

lionello · 2025-06-27T17:21:37Z

No description provided.

Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#131) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>

Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>

* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs * Fix rebase issue --------- Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>

* chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim

#180) This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0

- Run Docker container as non-root user (appuser) to minimize security risks - Add Docker HEALTHCHECK for better container orchestration - Make CORS configurable via ALLOWED_ORIGINS env var with security warning - Replace assertions with proper error handling (TypeError/ValueError) - Add 30s timeout to HTTP requests to prevent hanging connections - Disable auto-reload in production uvicorn settings

…oken (#184)

Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls

- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles - Remove unused region prefix functions and defaultdict import - Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts - Improve model ARN parsing and error handling in profile enumeration - Consolidate profile metadata storage to enable consistent feature detection

Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.

The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.

…requiring the user to cd manually (#202) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally

…lity Docker BuildKit (especially with docker-container driver) may create OCI image manifests with attestations that AWS Lambda does not support. Lambda requires Docker V2 Schema 2 format without multi-manifest index. This fix ensures the build script generates Lambda-compatible images regardless of the user's Docker/BuildKit configuration. Fixes #206

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow

Update dependencies to fix HIGH severity ReDoS vulnerability: - fastapi==0.128.0 - starlette==0.49.1 CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP Range headers that trigger quadratic-time processing in FileResponse Range parsing, causing CPU exhaustion and DoS. Fixes #215

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

UniMa007 and others added 5 commits May 27, 2025 21:52

Add Titan Embeddings G2 (#94)

aed5730

add titan G1 embeddings (#152)

844efec

feat: add support to include application inference profiles as models (…

0183608

…#131) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>

fix: properly handle tool_use messages in conversation

76a3614

lionello requested a review from nullfunc July 11, 2025 14:22

nullfunc approved these changes Jul 11, 2025

View reviewed changes

heisenbergye and others added 23 commits July 21, 2025 16:44

feat: support Claude 4 Interleaved thinking (beta) (#164)

3f1b56a

Add pagination to list_inference_profiles calls (#173)

a2110ff

Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>

chore: update requirements to fix vulnerability (#177)

bdfa57c

* chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim

docs: update deployment instructions and enhance ECR push script

e3ee9a7

chore: cleanup useless files

371d11d

Support <think> tags (#117)

8177876

fix: ECS container /health endpoint does not require API_KEY Bearer T…

7756532

…oken (#184)

🐳 preload tiktoken encoding in Dockerfile_ecs (#193)

18b68bd

fix: Fix invalid cache_creation_tokens metric key (#195)

7e03ab0

Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200)

ce4cfab

Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.

Fix healthcheck in Dockerfile_ecs (#199)

b3c1c82

The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.

fix: Allow the push-to-ecr.sh script to run from anywhere instead of …

37374e7

…requiring the user to cd manually (#202) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally

feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208)

0411454

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

Add support for 'developer' role in chat messages (#209)

1a7f55b

fix: support continue response for claude opus 4.6 (#219)

a150f7b

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with upstream#24

Sync with upstream#24
lionello wants to merge 29 commits intoDefangLabs:defangfrom
aws-samples:main

lionello commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

lionello commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants