|
| 1 | +--- |
| 2 | +title: "v1.77.7-rc - Performance Optimizations & Claude Sonnet 4.5" |
| 3 | +slug: "v1-77-7" |
| 4 | +date: 2025-10-04T10:00:00 |
| 5 | +authors: |
| 6 | + - name: Krrish Dholakia |
| 7 | + title: CEO, LiteLLM |
| 8 | + url: https://www.linkedin.com/in/krish-d/ |
| 9 | + image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg |
| 10 | + - name: Ishaan Jaff |
| 11 | + title: CTO, LiteLLM |
| 12 | + url: https://www.linkedin.com/in/reffajnaahsi/ |
| 13 | + image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg |
| 14 | + - name: Alexsander Hamir |
| 15 | + title: Backend Performance Engineer |
| 16 | + url: https://www.linkedin.com/in/alexsander-baptista/ |
| 17 | + image_url: https://media.licdn.com/dms/image/v2/D5603AQGXnziu4kqNCQ/profile-displayphoto-crop_800_800/B56ZkxEcuOKEAI-/0/1757464874550?e=1762387200&v=beta&t=9SNXLsWhx8OnYPAMQ9fqAr02oevDYEAL2vMYg2f9ieg |
| 18 | + - name: Achintya Srivastava |
| 19 | + title: Fullstack Engineer |
| 20 | + url: https://www.linkedin.com/in/achintya-rajan/ |
| 21 | + image_url: https://media.licdn.com/dms/image/v2/D5603AQGdkEeyJTdljw/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1716271140869?e=1762387200&v=beta&t=9gOoLPeqR2E5z3KSX61EUj3HVZXmgo87vhVuSHeffjc |
| 22 | + |
| 23 | +hide_table_of_contents: false |
| 24 | +--- |
| 25 | + |
| 26 | +import Image from '@theme/IdealImage'; |
| 27 | +import Tabs from '@theme/Tabs'; |
| 28 | +import TabItem from '@theme/TabItem'; |
| 29 | + |
| 30 | +## Deploy this version |
| 31 | + |
| 32 | +<Tabs> |
| 33 | +<TabItem value="docker" label="Docker"> |
| 34 | + |
| 35 | +``` showLineNumbers title="docker run litellm" |
| 36 | +docker run \ |
| 37 | +-e STORE_MODEL_IN_DB=True \ |
| 38 | +-p 4000:4000 \ |
| 39 | +ghcr.io/berriai/litellm:v1.77.7.rc.1 |
| 40 | +``` |
| 41 | + |
| 42 | +</TabItem> |
| 43 | + |
| 44 | +<TabItem value="pip" label="Pip"> |
| 45 | + |
| 46 | +``` showLineNumbers title="pip install litellm" |
| 47 | +pip install litellm==1.77.7.rc.1 |
| 48 | +``` |
| 49 | + |
| 50 | +</TabItem> |
| 51 | +</Tabs> |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## Key Highlights |
| 56 | + |
| 57 | +- **Major Performance Improvements** - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n)) |
| 58 | +- **Claude Sonnet 4.5** - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing |
| 59 | +- **MCP Gateway Enhancements** - Fine-grained tool control, server permissions, and forwardable headers |
| 60 | +- **AMD Lemonade & Nvidia NIM** - New provider support for AMD Lemonade and Nvidia NIM Rerank |
| 61 | +- **GitLab Prompt Management** - GitLab-based prompt management integration |
| 62 | + |
| 63 | +## New Models / Updated Models |
| 64 | + |
| 65 | +#### New Model Support |
| 66 | + |
| 67 | +| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features | |
| 68 | +| -------- | ----- | -------------- | ------------------- | -------------------- | -------- | |
| 69 | +| Anthropic | `claude-sonnet-4-5` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching | |
| 70 | +| Anthropic | `claude-sonnet-4-5-20250929` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching | |
| 71 | +| Bedrock | `eu.anthropic.claude-sonnet-4-5-20250929-v1:0` | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching | |
| 72 | +| Azure AI | `azure_ai/grok-4` | 131K | $5.50 | $27.50 | Chat, reasoning, function calling, web search | |
| 73 | +| Azure AI | `azure_ai/grok-4-fast-reasoning` | 131K | $5.80 | $2,900.00 | Chat, reasoning, function calling, web search | |
| 74 | +| Azure AI | `azure_ai/grok-4-fast-non-reasoning` | 131K | $5.00 | $2,500.00 | Chat, function calling, web search | |
| 75 | +| Azure AI | `azure_ai/grok-code-fast-1` | 131K | $3.50 | $17.50 | Chat, function calling, web search | |
| 76 | +| Groq | `groq/moonshotai/kimi-k2-instruct-0905` | Context varies | Pricing varies | Pricing varies | Chat, function calling | |
| 77 | +| Ollama | Ollama Cloud models | Varies | Free | Free | Self-hosted models via Ollama Cloud | |
| 78 | + |
| 79 | +#### Features |
| 80 | + |
| 81 | +- **[Anthropic](../../docs/providers/anthropic)** |
| 82 | + - Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - [PR #15041](https://github.com/BerriAI/litellm/pull/15041) |
| 83 | + - Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - [PR #15049](https://github.com/BerriAI/litellm/pull/15049) |
| 84 | + - Add 200K prices for Sonnet 4.5 - [PR #15140](https://github.com/BerriAI/litellm/pull/15140) |
| 85 | + - Add cost tracking for /v1/messages in streaming response - [PR #15102](https://github.com/BerriAI/litellm/pull/15102) |
| 86 | + - Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - [PR #15034](https://github.com/BerriAI/litellm/pull/15034) |
| 87 | +- **[Gemini](../../docs/providers/gemini)** |
| 88 | + - Add full support for native Gemini API translation - [PR #15029](https://github.com/BerriAI/litellm/pull/15029) |
| 89 | + - Add Gemini generateContent passthrough cost tracking - [PR #15014](https://github.com/BerriAI/litellm/pull/15014) |
| 90 | + - Add streamGenerateContent cost tracking in passthrough - [PR #15199](https://github.com/BerriAI/litellm/pull/15199) |
| 91 | + - Ignore type param for gemini tools - [PR #15022](https://github.com/BerriAI/litellm/pull/15022) |
| 92 | +- **[Vertex AI](../../docs/providers/vertex)** |
| 93 | + - Add LiteLLM Overhead metric for VertexAI - [PR #15040](https://github.com/BerriAI/litellm/pull/15040) |
| 94 | + - Add cost tracking for Vertex AI Passthrough `/predict` endpoint - [PR #15019](https://github.com/BerriAI/litellm/pull/15019) |
| 95 | + - Add cost tracking for Vertex AI Live API WebSocket Passthrough - [PR #14956](https://github.com/BerriAI/litellm/pull/14956) |
| 96 | + - Support googlemap grounding in vertex ai - [PR #15179](https://github.com/BerriAI/litellm/pull/15179) |
| 97 | +- **[Azure](../../docs/providers/azure)** |
| 98 | + - Add azure_ai grok-4 model family - [PR #15137](https://github.com/BerriAI/litellm/pull/15137) |
| 99 | + - Use the `extra_query` parameter for GET requests in Azure Batch - [PR #14997](https://github.com/BerriAI/litellm/pull/14997) |
| 100 | + - Use extra_query for download results (Batch API) - [PR #15025](https://github.com/BerriAI/litellm/pull/15025) |
| 101 | + - Add support for Azure AD token-based authorization - [PR #14813](https://github.com/BerriAI/litellm/pull/14813) |
| 102 | +- **[Ollama](../../docs/providers/ollama)** |
| 103 | + - Add ollama cloud models - [PR #15008](https://github.com/BerriAI/litellm/pull/15008) |
| 104 | +- **[Groq](../../docs/providers/groq)** |
| 105 | + - Add groq/moonshotai/kimi-k2-instruct-0905 - [PR #15079](https://github.com/BerriAI/litellm/pull/15079) |
| 106 | +- **[OpenAI](../../docs/providers/openai)** |
| 107 | + - Add support for GPT 5 codex models - [PR #14841](https://github.com/BerriAI/litellm/pull/14841) |
| 108 | +- **[DeepInfra](../../docs/providers/deepinfra)** |
| 109 | + - Update DeepInfra model data refresh with latest pricing - [PR #14939](https://github.com/BerriAI/litellm/pull/14939) |
| 110 | +- **[Bedrock](../../docs/providers/bedrock)** |
| 111 | + - Add JP Cross-Region Inference - [PR #15188](https://github.com/BerriAI/litellm/pull/15188) |
| 112 | + - Add "eu.anthropic.claude-sonnet-4-5-20250929-v1:0" - [PR #15181](https://github.com/BerriAI/litellm/pull/15181) |
| 113 | + - Add twelvelabs bedrock Async Invoke Support - [PR #14871](https://github.com/BerriAI/litellm/pull/14871) |
| 114 | +- **[Nvidia NIM](../../docs/providers/nvidia_nim)** |
| 115 | + - Add Nvidia NIM Rerank Support - [PR #15152](https://github.com/BerriAI/litellm/pull/15152) |
| 116 | + |
| 117 | +### Bug Fixes |
| 118 | + |
| 119 | +- **[VLLM](../../docs/providers/vllm)** |
| 120 | + - Fix response_format bug in hosted vllm audio_transcription - [PR #15010](https://github.com/BerriAI/litellm/pull/15010) |
| 121 | + - Fix passthrough of atranscription into kwargs going to upstream provider - [PR #15005](https://github.com/BerriAI/litellm/pull/15005) |
| 122 | +- **[OCI](../../docs/providers/oci)** |
| 123 | + - Fix OCI Generative AI Integration when using Proxy - [PR #15072](https://github.com/BerriAI/litellm/pull/15072) |
| 124 | +- **General** |
| 125 | + - Fix: Authorization header to use correct "Bearer" capitalization - [PR #14764](https://github.com/BerriAI/litellm/pull/14764) |
| 126 | + - Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - [PR #15116](https://github.com/BerriAI/litellm/pull/15116) |
| 127 | + - Fix missing HTTPException import - [PR #15111](https://github.com/BerriAI/litellm/pull/15111) |
| 128 | + - Fix: model_group not always present in litellm_params, and metadata - [PR #15108](https://github.com/BerriAI/litellm/pull/15108) |
| 129 | + - Update request handling for original exceptions - [PR #15013](https://github.com/BerriAI/litellm/pull/15013) |
| 130 | + - Remove invalid vertex -latest models - [PR #15043](https://github.com/BerriAI/litellm/pull/15043) |
| 131 | + |
| 132 | +#### New Provider Support |
| 133 | + |
| 134 | +- **[AMD Lemonade](../../docs/providers/lemonade)** |
| 135 | + - Add AMD Lemonade provider support - [PR #14840](https://github.com/BerriAI/litellm/pull/14840) |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +## LLM API Endpoints |
| 140 | + |
| 141 | +#### Features |
| 142 | + |
| 143 | +- **[Responses API](../../docs/response_api)** |
| 144 | + - Return Cost for Responses API Streaming requests - [PR #15053](https://github.com/BerriAI/litellm/pull/15053) |
| 145 | + |
| 146 | +- **General** |
| 147 | + - Preserve Whitespace Characters in Model Response Streams - [PR #15160](https://github.com/BerriAI/litellm/pull/15160) |
| 148 | + - Add provider name to payload specification - [PR #15130](https://github.com/BerriAI/litellm/pull/15130) |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## Management Endpoints / UI |
| 153 | + |
| 154 | +#### Features |
| 155 | + |
| 156 | +- **Virtual Keys** |
| 157 | + - Fix Session Token Cookie Infinite Logout Loop - [PR #15146](https://github.com/BerriAI/litellm/pull/15146) |
| 158 | + - Ensure LLM_API_KEYs can access pass through routes - [PR #15115](https://github.com/BerriAI/litellm/pull/15115) |
| 159 | + |
| 160 | +- **Models + Endpoints** |
| 161 | + - Ensure OCI secret fields not shared on /models and /v1/models endpoints - [PR #15085](https://github.com/BerriAI/litellm/pull/15085) |
| 162 | + - Add snowflake on UI - [PR #15083](https://github.com/BerriAI/litellm/pull/15083) |
| 163 | + - Make UI theme settings publicly accessible for custom branding - [PR #15074](https://github.com/BerriAI/litellm/pull/15074) |
| 164 | + |
| 165 | +- **Admin Settings** |
| 166 | + - Ensure OTEL settings are saved in DB after set on UI - [PR #15118](https://github.com/BerriAI/litellm/pull/15118) |
| 167 | + - Top api key tags - [PR #15151](https://github.com/BerriAI/litellm/pull/15151), [PR #15156](https://github.com/BerriAI/litellm/pull/15156) |
| 168 | + |
| 169 | +#### Bugs |
| 170 | + |
| 171 | +- **Dashboard** - Fix LiteLLM model name fallback in dashboard overview - [PR #14998](https://github.com/BerriAI/litellm/pull/14998) |
| 172 | +- **Passthrough API** - Ensure query params are forwarded from origin url to downstream request - [PR #15087](https://github.com/BerriAI/litellm/pull/15087) |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## Logging / Guardrail / Prompt Management Integrations |
| 177 | + |
| 178 | +#### Features |
| 179 | + |
| 180 | +- **[OpenTelemetry](../../docs/observability/otel)** |
| 181 | + - Use generation_name for span naming in logging method - [PR #14799](https://github.com/BerriAI/litellm/pull/14799) |
| 182 | +- **[Langfuse](../../docs/proxy/logging#langfuse)** |
| 183 | + - Handle non-serializable objects in Langfuse logging - [PR #15148](https://github.com/BerriAI/litellm/pull/15148) |
| 184 | + - Set usage_details.total in langfuse integration - [PR #15015](https://github.com/BerriAI/litellm/pull/15015) |
| 185 | + |
| 186 | +#### Guardrails |
| 187 | + |
| 188 | +- **[Javelin](../../docs/proxy/guardrails)** |
| 189 | + - Add Javelin standalone guardrails integration for LiteLLM Proxy - [PR #14983](https://github.com/BerriAI/litellm/pull/14983) |
| 190 | + - Add logging for important status fields in guardrails - [PR #15090](https://github.com/BerriAI/litellm/pull/15090) |
| 191 | + - Don't run post_call guardrail if no text returned from Bedrock - [PR #15106](https://github.com/BerriAI/litellm/pull/15106) |
| 192 | + |
| 193 | +#### Prompt Management |
| 194 | + |
| 195 | +- **[GitLab](../../docs/proxy/prompt_management)** |
| 196 | + - GitLab based Prompt manager - [PR #14988](https://github.com/BerriAI/litellm/pull/14988) |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Spend Tracking, Budgets and Rate Limiting |
| 201 | + |
| 202 | +- **Cost Tracking** |
| 203 | + - Proxy: end user cost tracking in the responses API - [PR #15124](https://github.com/BerriAI/litellm/pull/15124) |
| 204 | +- **Parallel Request Limiter v3** |
| 205 | + - Use well known redis cluster hashing algorithm - [PR #15052](https://github.com/BerriAI/litellm/pull/15052) |
| 206 | + - Fixes to dynamic rate limiter v3 - add saturation detection - [PR #15119](https://github.com/BerriAI/litellm/pull/15119) |
| 207 | + - Dynamic Rate Limiter v3 - fixes for detecting saturation + fixes for post saturation behavior - [PR #15192](https://github.com/BerriAI/litellm/pull/15192) |
| 208 | +- **Teams** |
| 209 | + - Add model specific tpm/rpm limits to teams on LiteLLM - [PR #15044](https://github.com/BerriAI/litellm/pull/15044) |
| 210 | +- **Configuration** |
| 211 | + - Add max requests env var - [PR #15007](https://github.com/BerriAI/litellm/pull/15007) |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +## MCP Gateway |
| 216 | + |
| 217 | +- **Server Configuration** |
| 218 | + - Specify forwardable headers, specify allowed/disallowed tools for MCP servers - [PR #15002](https://github.com/BerriAI/litellm/pull/15002) |
| 219 | + - Enforce server permissions on call tools - [PR #15044](https://github.com/BerriAI/litellm/pull/15044) |
| 220 | + - MCP Gateway Fine-grained Tools Addition - [PR #15153](https://github.com/BerriAI/litellm/pull/15153) |
| 221 | +- **Bug Fixes** |
| 222 | + - Remove servername prefix mcp tools tests - [PR #14986](https://github.com/BerriAI/litellm/pull/14986) |
| 223 | + - Resolve regression with duplicate Mcp-Protocol-Version header - [PR #15050](https://github.com/BerriAI/litellm/pull/15050) |
| 224 | + - Fix test_mcp_server.py - [PR #15183](https://github.com/BerriAI/litellm/pull/15183) |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## Performance / Loadbalancing / Reliability improvements |
| 229 | + |
| 230 | +- **Router Optimizations** |
| 231 | + - **+62.5% P99 Latency Improvement** - Remove router inefficiencies (from O(M*N) to O(1)) - [PR #15046](https://github.com/BerriAI/litellm/pull/15046) |
| 232 | + - Remove hasattr checks in Router - [PR #15082](https://github.com/BerriAI/litellm/pull/15082) |
| 233 | + - Remove Double Lookups - [PR #15084](https://github.com/BerriAI/litellm/pull/15084) |
| 234 | + - Optimize _filter_cooldown_deployments from O(n×m + k×n) to O(n) - [PR #15091](https://github.com/BerriAI/litellm/pull/15091) |
| 235 | + - Optimize unhealthy deployment filtering in retry path (O(n*m) → O(n+m)) - [PR #15110](https://github.com/BerriAI/litellm/pull/15110) |
| 236 | +- **Cache Optimizations** |
| 237 | + - Reduce complexity of InMemoryCache.evict_cache from O(n*log(n)) to O(log(n)) - [PR #15000](https://github.com/BerriAI/litellm/pull/15000) |
| 238 | + - Avoiding expensive operations when cache isn't available - [PR #15182](https://github.com/BerriAI/litellm/pull/15182) |
| 239 | +- **Metrics & Monitoring** |
| 240 | + - LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - [PR #15045](https://github.com/BerriAI/litellm/pull/15045) |
| 241 | + |
| 242 | +--- |
| 243 | + |
| 244 | +## Documentation Updates |
| 245 | + |
| 246 | +- **Provider Documentation** |
| 247 | + - Update litellm docs from latest release - [PR #15004](https://github.com/BerriAI/litellm/pull/15004) |
| 248 | + - Add missing api_key parameter - [PR #15058](https://github.com/BerriAI/litellm/pull/15058) |
| 249 | +- **General Documentation** |
| 250 | + - Use docker compose instead of docker-compose - [PR #15024](https://github.com/BerriAI/litellm/pull/15024) |
| 251 | + - Add railtracks to projects that are using litellm - [PR #15144](https://github.com/BerriAI/litellm/pull/15144) |
| 252 | + - Perf: Last week improvement - [PR #15193](https://github.com/BerriAI/litellm/pull/15193) |
| 253 | + - Sync models GitHub documentation with Loom video and cross-reference - [PR #15191](https://github.com/BerriAI/litellm/pull/15191) |
| 254 | + |
| 255 | +--- |
| 256 | + |
| 257 | +## Security Fixes |
| 258 | + |
| 259 | +- **JWT Token Security** - Don't log JWT SSO token on .info() log - [PR #15145](https://github.com/BerriAI/litellm/pull/15145) |
| 260 | + |
| 261 | +--- |
| 262 | + |
| 263 | +## New Contributors |
| 264 | + |
| 265 | +* @herve-ves made their first contribution in [PR #14998](https://github.com/BerriAI/litellm/pull/14998) |
| 266 | +* @wenxi-onyx made their first contribution in [PR #15008](https://github.com/BerriAI/litellm/pull/15008) |
| 267 | +* @jpetrucciani made their first contribution in [PR #15005](https://github.com/BerriAI/litellm/pull/15005) |
| 268 | +* @abhijitjavelin made their first contribution in [PR #14983](https://github.com/BerriAI/litellm/pull/14983) |
| 269 | +* @ZeroClover made their first contribution in [PR #15039](https://github.com/BerriAI/litellm/pull/15039) |
| 270 | +* @cedarm made their first contribution in [PR #15043](https://github.com/BerriAI/litellm/pull/15043) |
| 271 | +* @Isydmr made their first contribution in [PR #15025](https://github.com/BerriAI/litellm/pull/15025) |
| 272 | +* @serializer made their first contribution in [PR #15013](https://github.com/BerriAI/litellm/pull/15013) |
| 273 | +* @eddierichter-amd made their first contribution in [PR #14840](https://github.com/BerriAI/litellm/pull/14840) |
| 274 | +* @malags made their first contribution in [PR #15000](https://github.com/BerriAI/litellm/pull/15000) |
| 275 | +* @henryhwang made their first contribution in [PR #15029](https://github.com/BerriAI/litellm/pull/15029) |
| 276 | +* @plafleur made their first contribution in [PR #15111](https://github.com/BerriAI/litellm/pull/15111) |
| 277 | +* @tyler-liner made their first contribution in [PR #14799](https://github.com/BerriAI/litellm/pull/14799) |
| 278 | +* @Amir-R25 made their first contribution in [PR #15144](https://github.com/BerriAI/litellm/pull/15144) |
| 279 | +* @georg-wolflein made their first contribution in [PR #15124](https://github.com/BerriAI/litellm/pull/15124) |
| 280 | +* @niharm made their first contribution in [PR #15140](https://github.com/BerriAI/litellm/pull/15140) |
| 281 | +* @anthony-liner made their first contribution in [PR #15015](https://github.com/BerriAI/litellm/pull/15015) |
| 282 | +* @rishiganesh2002 made their first contribution in [PR #15153](https://github.com/BerriAI/litellm/pull/15153) |
| 283 | +* @danielaskdd made their first contribution in [PR #15160](https://github.com/BerriAI/litellm/pull/15160) |
| 284 | +* @JVenberg made their first contribution in [PR #15146](https://github.com/BerriAI/litellm/pull/15146) |
| 285 | +* @speglich made their first contribution in [PR #15072](https://github.com/BerriAI/litellm/pull/15072) |
| 286 | +* @daily-kim made their first contribution in [PR #14764](https://github.com/BerriAI/litellm/pull/14764) |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.77.5.rc.4...v1.77.7.rc.1)** |
0 commit comments