docs fix

ishaan-jaff · ishaan-jaff · commit 22b6a82105b3 · 2025-10-04T13:10:27.000-07:00
diff --git a/docs/my-website/release_notes/v1.77.7-stable/index.md b/docs/my-website/release_notes/v1.77.7-stable/index.md
@@ -0,0 +1,290 @@
+---
+title: "v1.77.7-rc - Performance Optimizations & Claude Sonnet 4.5"
+slug: "v1-77-7"
+date: 2025-10-04T10:00:00
+authors:
+  - name: Krrish Dholakia
+    title: CEO, LiteLLM
+    url: https://www.linkedin.com/in/krish-d/
+    image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
+  - name: Ishaan Jaff
+    title: CTO, LiteLLM
+    url: https://www.linkedin.com/in/reffajnaahsi/
+    image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
+  - name: Alexsander Hamir
+    title: Backend Performance Engineer
+    url: https://www.linkedin.com/in/alexsander-baptista/
+    image_url: https://media.licdn.com/dms/image/v2/D5603AQGXnziu4kqNCQ/profile-displayphoto-crop_800_800/B56ZkxEcuOKEAI-/0/1757464874550?e=1762387200&v=beta&t=9SNXLsWhx8OnYPAMQ9fqAr02oevDYEAL2vMYg2f9ieg
+  - name: Achintya Srivastava
+    title: Fullstack Engineer
+    url: https://www.linkedin.com/in/achintya-rajan/
+    image_url: https://media.licdn.com/dms/image/v2/D5603AQGdkEeyJTdljw/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1716271140869?e=1762387200&v=beta&t=9gOoLPeqR2E5z3KSX61EUj3HVZXmgo87vhVuSHeffjc
+
+hide_table_of_contents: false
+---
+
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Deploy this version
+
+<Tabs>
+<TabItem value="docker" label="Docker">
+
+``` showLineNumbers title="docker run litellm"
+docker run \
+-e STORE_MODEL_IN_DB=True \
+-p 4000:4000 \
+ghcr.io/berriai/litellm:v1.77.7.rc.1
+```
+
+</TabItem>
+
+<TabItem value="pip" label="Pip">
+
+``` showLineNumbers title="pip install litellm"
+pip install litellm==1.77.7.rc.1
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+## Key Highlights
+
+- **Major Performance Improvements** - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n))
+- **Claude Sonnet 4.5** - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
+- **MCP Gateway Enhancements** - Fine-grained tool control, server permissions, and forwardable headers
+- **AMD Lemonade & Nvidia NIM** - New provider support for AMD Lemonade and Nvidia NIM Rerank
+- **GitLab Prompt Management** - GitLab-based prompt management integration
+
+## New Models / Updated Models
+
+#### New Model Support
+
+| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
+| -------- | ----- | -------------- | ------------------- | -------------------- | -------- |
+| Anthropic | `claude-sonnet-4-5` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
+| Anthropic | `claude-sonnet-4-5-20250929` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
+| Bedrock | `eu.anthropic.claude-sonnet-4-5-20250929-v1:0` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
+| Azure AI | `azure_ai/grok-4` | 131K | $5.50 | $27.50 | Chat, reasoning, function calling, web search |
+| Azure AI | `azure_ai/grok-4-fast-reasoning` | 131K | $5.80 | $2,900.00 | Chat, reasoning, function calling, web search |
+| Azure AI | `azure_ai/grok-4-fast-non-reasoning` | 131K | $5.00 | $2,500.00 | Chat, function calling, web search |
+| Azure AI | `azure_ai/grok-code-fast-1` | 131K | $3.50 | $17.50 | Chat, function calling, web search |
+| Groq | `groq/moonshotai/kimi-k2-instruct-0905` | Context varies | Pricing varies | Pricing varies | Chat, function calling |
+| Ollama | Ollama Cloud models | Varies | Free | Free | Self-hosted models via Ollama Cloud |
+
+#### Features
+
+- **[Anthropic](../../docs/providers/anthropic)**
+    - Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - [PR #15041](https://github.com/BerriAI/litellm/pull/15041)
+    - Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - [PR #15049](https://github.com/BerriAI/litellm/pull/15049)
+    - Add 200K prices for Sonnet 4.5 - [PR #15140](https://github.com/BerriAI/litellm/pull/15140)
+    - Add cost tracking for /v1/messages in streaming response - [PR #15102](https://github.com/BerriAI/litellm/pull/15102)
+    - Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - [PR #15034](https://github.com/BerriAI/litellm/pull/15034)
+- **[Gemini](../../docs/providers/gemini)**
+    - Add full support for native Gemini API translation - [PR #15029](https://github.com/BerriAI/litellm/pull/15029)
+    - Add Gemini generateContent passthrough cost tracking - [PR #15014](https://github.com/BerriAI/litellm/pull/15014)
+    - Add streamGenerateContent cost tracking in passthrough - [PR #15199](https://github.com/BerriAI/litellm/pull/15199)
+    - Ignore type param for gemini tools - [PR #15022](https://github.com/BerriAI/litellm/pull/15022)
+- **[Vertex AI](../../docs/providers/vertex)**
+    - Add LiteLLM Overhead metric for VertexAI - [PR #15040](https://github.com/BerriAI/litellm/pull/15040)
+    - Add cost tracking for Vertex AI Passthrough `/predict` endpoint - [PR #15019](https://github.com/BerriAI/litellm/pull/15019)
+    - Add cost tracking for Vertex AI Live API WebSocket Passthrough - [PR #14956](https://github.com/BerriAI/litellm/pull/14956)
+    - Support googlemap grounding in vertex ai - [PR #15179](https://github.com/BerriAI/litellm/pull/15179)
+- **[Azure](../../docs/providers/azure)**
+    - Add azure_ai grok-4 model family - [PR #15137](https://github.com/BerriAI/litellm/pull/15137)
+    - Use the `extra_query` parameter for GET requests in Azure Batch - [PR #14997](https://github.com/BerriAI/litellm/pull/14997)
+    - Use extra_query for download results (Batch API) - [PR #15025](https://github.com/BerriAI/litellm/pull/15025)
+    - Add support for Azure AD token-based authorization - [PR #14813](https://github.com/BerriAI/litellm/pull/14813)
+- **[Ollama](../../docs/providers/ollama)**
+    - Add ollama cloud models - [PR #15008](https://github.com/BerriAI/litellm/pull/15008)
+- **[Groq](../../docs/providers/groq)**
+    - Add groq/moonshotai/kimi-k2-instruct-0905 - [PR #15079](https://github.com/BerriAI/litellm/pull/15079)
+- **[OpenAI](../../docs/providers/openai)**
+    - Add support for GPT 5 codex models - [PR #14841](https://github.com/BerriAI/litellm/pull/14841)
+- **[DeepInfra](../../docs/providers/deepinfra)**
+    - Update DeepInfra model data refresh with latest pricing - [PR #14939](https://github.com/BerriAI/litellm/pull/14939)
+- **[Bedrock](../../docs/providers/bedrock)**
+    - Add JP Cross-Region Inference - [PR #15188](https://github.com/BerriAI/litellm/pull/15188)
+    - Add "eu.anthropic.claude-sonnet-4-5-20250929-v1:0" - [PR #15181](https://github.com/BerriAI/litellm/pull/15181)
+    - Add twelvelabs bedrock Async Invoke Support - [PR #14871](https://github.com/BerriAI/litellm/pull/14871)
+- **[Nvidia NIM](../../docs/providers/nvidia_nim)**
+    - Add Nvidia NIM Rerank Support - [PR #15152](https://github.com/BerriAI/litellm/pull/15152)
+
+### Bug Fixes
+
+- **[VLLM](../../docs/providers/vllm)**
+    - Fix response_format bug in hosted vllm audio_transcription - [PR #15010](https://github.com/BerriAI/litellm/pull/15010)
+    - Fix passthrough of atranscription into kwargs going to upstream provider - [PR #15005](https://github.com/BerriAI/litellm/pull/15005)
+- **[OCI](../../docs/providers/oci)**
+    - Fix OCI Generative AI Integration when using Proxy - [PR #15072](https://github.com/BerriAI/litellm/pull/15072)
+- **General**
+    - Fix: Authorization header to use correct "Bearer" capitalization - [PR #14764](https://github.com/BerriAI/litellm/pull/14764)
+    - Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - [PR #15116](https://github.com/BerriAI/litellm/pull/15116)
+    - Fix missing HTTPException import - [PR #15111](https://github.com/BerriAI/litellm/pull/15111)
+    - Fix: model_group not always present in litellm_params, and metadata - [PR #15108](https://github.com/BerriAI/litellm/pull/15108)
+    - Update request handling for original exceptions - [PR #15013](https://github.com/BerriAI/litellm/pull/15013)
+    - Remove invalid vertex -latest models - [PR #15043](https://github.com/BerriAI/litellm/pull/15043)
+
+#### New Provider Support
+
+- **[AMD Lemonade](../../docs/providers/lemonade)**
+    - Add AMD Lemonade provider support - [PR #14840](https://github.com/BerriAI/litellm/pull/14840)
+
+---
+
+## LLM API Endpoints
+
+#### Features
+
+- **[Responses API](../../docs/response_api)**
+    - Return Cost for Responses API Streaming requests - [PR #15053](https://github.com/BerriAI/litellm/pull/15053)
+
+- **General**
+    - Preserve Whitespace Characters in Model Response Streams - [PR #15160](https://github.com/BerriAI/litellm/pull/15160)
+    - Add provider name to payload specification - [PR #15130](https://github.com/BerriAI/litellm/pull/15130)
+
+---
+
+## Management Endpoints / UI
+
+#### Features
+
+- **Virtual Keys**
+    - Fix Session Token Cookie Infinite Logout Loop - [PR #15146](https://github.com/BerriAI/litellm/pull/15146)
+    - Ensure LLM_API_KEYs can access pass through routes - [PR #15115](https://github.com/BerriAI/litellm/pull/15115)
+    
+- **Models + Endpoints**
+    - Ensure OCI secret fields not shared on /models and /v1/models endpoints - [PR #15085](https://github.com/BerriAI/litellm/pull/15085)
+    - Add snowflake on UI - [PR #15083](https://github.com/BerriAI/litellm/pull/15083)
+    - Make UI theme settings publicly accessible for custom branding - [PR #15074](https://github.com/BerriAI/litellm/pull/15074)
+    
+- **Admin Settings**
+    - Ensure OTEL settings are saved in DB after set on UI - [PR #15118](https://github.com/BerriAI/litellm/pull/15118)
+    - Top api key tags - [PR #15151](https://github.com/BerriAI/litellm/pull/15151), [PR #15156](https://github.com/BerriAI/litellm/pull/15156)
+
+#### Bugs
+
+- **Dashboard** - Fix LiteLLM model name fallback in dashboard overview - [PR #14998](https://github.com/BerriAI/litellm/pull/14998)
+- **Passthrough API** - Ensure query params are forwarded from origin url to downstream request - [PR #15087](https://github.com/BerriAI/litellm/pull/15087)
+
+---
+
+## Logging / Guardrail / Prompt Management Integrations
+
+#### Features
+
+- **[OpenTelemetry](../../docs/observability/otel)**
+    - Use generation_name for span naming in logging method - [PR #14799](https://github.com/BerriAI/litellm/pull/14799)
+- **[Langfuse](../../docs/proxy/logging#langfuse)**
+    - Handle non-serializable objects in Langfuse logging - [PR #15148](https://github.com/BerriAI/litellm/pull/15148)
+    - Set usage_details.total in langfuse integration - [PR #15015](https://github.com/BerriAI/litellm/pull/15015)
+
+#### Guardrails
+
+- **[Javelin](../../docs/proxy/guardrails)**
+    - Add Javelin standalone guardrails integration for LiteLLM Proxy - [PR #14983](https://github.com/BerriAI/litellm/pull/14983)
+    - Add logging for important status fields in guardrails - [PR #15090](https://github.com/BerriAI/litellm/pull/15090)
+    - Don't run post_call guardrail if no text returned from Bedrock - [PR #15106](https://github.com/BerriAI/litellm/pull/15106)
+
+#### Prompt Management
+
+- **[GitLab](../../docs/proxy/prompt_management)**
+    - GitLab based Prompt manager - [PR #14988](https://github.com/BerriAI/litellm/pull/14988)
+
+---
+
+## Spend Tracking, Budgets and Rate Limiting
+
+- **Cost Tracking** 
+    - Proxy: end user cost tracking in the responses API - [PR #15124](https://github.com/BerriAI/litellm/pull/15124)
+- **Parallel Request Limiter v3** 
+    - Use well known redis cluster hashing algorithm - [PR #15052](https://github.com/BerriAI/litellm/pull/15052)
+    - Fixes to dynamic rate limiter v3 - add saturation detection - [PR #15119](https://github.com/BerriAI/litellm/pull/15119)
+    - Dynamic Rate Limiter v3 - fixes for detecting saturation + fixes for post saturation behavior - [PR #15192](https://github.com/BerriAI/litellm/pull/15192)
+- **Teams** 
+    - Add model specific tpm/rpm limits to teams on LiteLLM - [PR #15044](https://github.com/BerriAI/litellm/pull/15044)
+- **Configuration** 
+    - Add max requests env var - [PR #15007](https://github.com/BerriAI/litellm/pull/15007)
+
+---
+
+## MCP Gateway
+
+- **Server Configuration** 
+    - Specify forwardable headers, specify allowed/disallowed tools for MCP servers - [PR #15002](https://github.com/BerriAI/litellm/pull/15002)
+    - Enforce server permissions on call tools - [PR #15044](https://github.com/BerriAI/litellm/pull/15044)
+    - MCP Gateway Fine-grained Tools Addition - [PR #15153](https://github.com/BerriAI/litellm/pull/15153)
+- **Bug Fixes** 
+    - Remove servername prefix mcp tools tests - [PR #14986](https://github.com/BerriAI/litellm/pull/14986)
+    - Resolve regression with duplicate Mcp-Protocol-Version header - [PR #15050](https://github.com/BerriAI/litellm/pull/15050)
+    - Fix test_mcp_server.py - [PR #15183](https://github.com/BerriAI/litellm/pull/15183)
+
+---
+
+## Performance / Loadbalancing / Reliability improvements
+
+- **Router Optimizations**
+    - **+62.5% P99 Latency Improvement** - Remove router inefficiencies (from O(M*N) to O(1)) - [PR #15046](https://github.com/BerriAI/litellm/pull/15046)
+    - Remove hasattr checks in Router - [PR #15082](https://github.com/BerriAI/litellm/pull/15082)
+    - Remove Double Lookups - [PR #15084](https://github.com/BerriAI/litellm/pull/15084)
+    - Optimize _filter_cooldown_deployments from O(n×m + k×n) to O(n) - [PR #15091](https://github.com/BerriAI/litellm/pull/15091)
+    - Optimize unhealthy deployment filtering in retry path (O(n*m) → O(n+m)) - [PR #15110](https://github.com/BerriAI/litellm/pull/15110)
+- **Cache Optimizations**
+    - Reduce complexity of InMemoryCache.evict_cache from O(n*log(n)) to O(log(n)) - [PR #15000](https://github.com/BerriAI/litellm/pull/15000)
+    - Avoiding expensive operations when cache isn't available - [PR #15182](https://github.com/BerriAI/litellm/pull/15182)
+- **Metrics & Monitoring**
+    - LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - [PR #15045](https://github.com/BerriAI/litellm/pull/15045)
+
+---
+
+## Documentation Updates
+
+- **Provider Documentation** 
+    - Update litellm docs from latest release - [PR #15004](https://github.com/BerriAI/litellm/pull/15004)
+    - Add missing api_key parameter - [PR #15058](https://github.com/BerriAI/litellm/pull/15058)
+- **General Documentation** 
+    - Use docker compose instead of docker-compose - [PR #15024](https://github.com/BerriAI/litellm/pull/15024)
+    - Add railtracks to projects that are using litellm - [PR #15144](https://github.com/BerriAI/litellm/pull/15144)
+    - Perf: Last week improvement - [PR #15193](https://github.com/BerriAI/litellm/pull/15193)
+    - Sync models GitHub documentation with Loom video and cross-reference - [PR #15191](https://github.com/BerriAI/litellm/pull/15191)
+
+---
+
+## Security Fixes
+
+- **JWT Token Security** - Don't log JWT SSO token on .info() log - [PR #15145](https://github.com/BerriAI/litellm/pull/15145)
+
+---
+
+## New Contributors
+
+* @herve-ves made their first contribution in [PR #14998](https://github.com/BerriAI/litellm/pull/14998)
+* @wenxi-onyx made their first contribution in [PR #15008](https://github.com/BerriAI/litellm/pull/15008)
+* @jpetrucciani made their first contribution in [PR #15005](https://github.com/BerriAI/litellm/pull/15005)
+* @abhijitjavelin made their first contribution in [PR #14983](https://github.com/BerriAI/litellm/pull/14983)
+* @ZeroClover made their first contribution in [PR #15039](https://github.com/BerriAI/litellm/pull/15039)
+* @cedarm made their first contribution in [PR #15043](https://github.com/BerriAI/litellm/pull/15043)
+* @Isydmr made their first contribution in [PR #15025](https://github.com/BerriAI/litellm/pull/15025)
+* @serializer made their first contribution in [PR #15013](https://github.com/BerriAI/litellm/pull/15013)
+* @eddierichter-amd made their first contribution in [PR #14840](https://github.com/BerriAI/litellm/pull/14840)
+* @malags made their first contribution in [PR #15000](https://github.com/BerriAI/litellm/pull/15000)
+* @henryhwang made their first contribution in [PR #15029](https://github.com/BerriAI/litellm/pull/15029)
+* @plafleur made their first contribution in [PR #15111](https://github.com/BerriAI/litellm/pull/15111)
+* @tyler-liner made their first contribution in [PR #14799](https://github.com/BerriAI/litellm/pull/14799)
+* @Amir-R25 made their first contribution in [PR #15144](https://github.com/BerriAI/litellm/pull/15144)
+* @georg-wolflein made their first contribution in [PR #15124](https://github.com/BerriAI/litellm/pull/15124)
+* @niharm made their first contribution in [PR #15140](https://github.com/BerriAI/litellm/pull/15140)
+* @anthony-liner made their first contribution in [PR #15015](https://github.com/BerriAI/litellm/pull/15015)
+* @rishiganesh2002 made their first contribution in [PR #15153](https://github.com/BerriAI/litellm/pull/15153)
+* @danielaskdd made their first contribution in [PR #15160](https://github.com/BerriAI/litellm/pull/15160)
+* @JVenberg made their first contribution in [PR #15146](https://github.com/BerriAI/litellm/pull/15146)
+* @speglich made their first contribution in [PR #15072](https://github.com/BerriAI/litellm/pull/15072)
+* @daily-kim made their first contribution in [PR #14764](https://github.com/BerriAI/litellm/pull/14764)
+
+---
+
+## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.77.5.rc.4...v1.77.7.rc.1)**