|
| 1 | +--- |
| 2 | +title: "April" |
| 3 | +--- |
| 4 | + |
| 5 | +**April in review ◀️✨** |
| 6 | + |
| 7 | +We kicked off April with an announcement— we make 95% of LLM costs vanish overnight. Just a bait, some of you bit (≧ᗜ≦) |
| 8 | + |
| 9 | +While we can’t make bills disappear with a snap, we’ve delivered some powerful upgrades this month that will help you build and ship robust, reliable GenAI apps, faster! |
| 10 | + |
| 11 | +This month, we introduced updates to the platform and gateway around governance, security & guardrails, new integrations, and all the latest models! Along with this, we’re working on something bigger: [the missing piece](https://x.com/PortkeyAI/status/1912491547653701891) in the AI agents stack! |
| 12 | + |
| 13 | +Here’s what we shipped last month: |
| 14 | + |
| 15 | + |
| 16 | +## Summary |
| 17 | + |
| 18 | +| Area | Key Updates | |
| 19 | +| :-- | :-- | |
| 20 | +| Platform | • Prompt CRUD APIs<br/>• Export logs to your internal stack<br/>• Budget limits and rate limits on workspace<br/>• n8n integration<br/>• OpenAI Codex CLI integration<br/>• New retry setting to determine wait times<br/>• Milvus for Semantic Cache<br/>• Plugins moved to org-level Settings<br/>• Virtual Key exhaustion alert includes workspace<br/>• Workspace control setup option | |
| 21 | +| Gateway & Providers | • OpenAI embeddings latency improvement (200ms)<br/>• Responses API for OpenAI & Azure OpenAI<br/>• Bedrock prompt caching via unified API<br/>• Virtual keys for self-hosted models<br/>• Tool calling support for Groq, OpenRouter, and Ollama<br/>• New providers: Dashscope, Recraft AI, Replicate, Azure AI Foundry<br/>• Enhanced parameter support: Openrouter, Vertex AI, Perplexity, Bedrock<br/>• Claude’s `anthropic_beta` parameter for Computer use beta | |
| 22 | +| Technical Improvements | • Unified caching/logging of thinking responses<br/>• Strict metadata logging: Workspace > API Key > Request<br/>• Prompt render endpoint available on Gateway URL<br/>• API key default config now locked from overrides | |
| 23 | +| New Models & Integrations | • GPT-4.1<br/>• Gemini 2.5 Pro and Flash<br/>• LLaMA 4 via Fireworks, Together, Groq<br/>• o1-pro<br/>• gpt-image-1<br/>• Qwen 3<br/>• Audio models via Groq | |
| 24 | +| Guardrails | • Azure AI Content Safety integration<br/>• Exa Online Search as a Guardrail | |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Platform |
| 29 | + |
| 30 | +**Prompt CRUD APIs** |
| 31 | + |
| 32 | +Prompt CRUD APIs give you the control to scale by enabling you to: |
| 33 | + |
| 34 | +- Programmatically create, update, and delete prompts |
| 35 | +- Manage prompts in bulk or version-control them |
| 36 | +- Integrate prompt updates into your own tools and workflows |
| 37 | +- Automate updates for A/B testing and rapid experimentation |
| 38 | + |
| 39 | +Read more about this [here](https://portkey.ai/docs/api-reference/admin-api/control-plane/prompts/create-prompt). |
| 40 | + |
| 41 | +**Export logs to your internal stack** |
| 42 | + |
| 43 | +Enterprises can now push analytics logs to any OTEL-compliant store through Portkey to centralize monitoring, maintain compliance, and ensure efficient operations. |
| 44 | +See how it's done [here](https://portkey.ai/docs/product/enterprise-offering/components#analytics-store) |
| 45 | + |
| 46 | +**Budget limits and rate limts on workspace** |
| 47 | + |
| 48 | +Configure budget and rate limits at the workspace level to: |
| 49 | + |
| 50 | +- Allocate specific budgets to different departments, teams, or projects |
| 51 | +- Prevent individual workspaces from consuming disproportionate resources |
| 52 | +- Ensure equitable API access and complete visibility |
| 53 | + |
| 54 | +**n8n integration** |
| 55 | + |
| 56 | +Add enterprise-grade controls to your n8n workflows with: |
| 57 | + |
| 58 | +- **Unified AI Gateway**: Connect to 1600+ models with full API key management—not just OpenAI or Anthropic. |
| 59 | +- **Centralized observability**: Track 40+ metrics and request logs in real time. |
| 60 | +- **Governance**: Monitor spend, set budgets, and apply RBAC across workflows. |
| 61 | +- **Security guardrails**: Enable PII detection, content filtering, and compliance controls. |
| 62 | + |
| 63 | +Read more about the integration[here](https://portkey.ai/docs/integrations/libraries/n8n) |
| 64 | + |
| 65 | +**OpenAI Codex CLI integration** |
| 66 | + |
| 67 | +OpenAI Codex CLI gives developers a streamlined way to analyze, modify, and execute code directly from their terminal. Portkey's integration enhances this experience with: |
| 68 | + |
| 69 | +- Access to 250+ additional models beyond OpenAI Codex CLI's standard offerings |
| 70 | +- Content filtering and PII detection with guardrails |
| 71 | +- Real-time analytics and logging |
| 72 | +- Cost attribution, budget controls, RBAC, and more! |
| 73 | + |
| 74 | +Read more about the integration[here](https://portkey.ai/docs/integrations/libraries/codex#openai-codex-cli) |
| 75 | + |
| 76 | +**Other updates** |
| 77 | + |
| 78 | +- Introduced a new retry setting `use_retry_after_header`. When set to true, if the provider returns the `retry-after` or `retry-after-ms headers`, the Gateway will use these headers to determine retry wait times, rather than applying the default exponential backoff for 429 responses. |
| 79 | +- You can now store and retrieve vector embeddings for semantic cache using Milvus in Portkey. Read more about semantic cache store [here](https://portkey.ai/docs/product/enterprise-offering/components#semantic-cache-store) |
| 80 | +- Plugins have now been moved under Settings (org-level) in the Portkey app. |
| 81 | +- Virtual Key exhaustion alert emails now include which workspace the exhausted key belonged to. |
| 82 | +- Set up your workspace with Workspace control on the Portkey app. |
| 83 | + |
| 84 | +## Gateway & Providers |
| 85 | + |
| 86 | +**OpenAI embeddings response** |
| 87 | +We’ve optimized the Gateway’s handling of OpenAI embeddings requests, leading to around 200ms improvement in response latency. |
| 88 | + |
| 89 | +**Responses API** |
| 90 | + |
| 91 | +You can now use the Responses API to access OpenAI and Azure OpenAI models on Portkey, enabling a flexible and easier way to create agentic experiences. |
| 92 | + |
| 93 | +- Complete observability and usage tracking |
| 94 | +- Caching support for streaming requests |
| 95 | +- Access to advanced tools — web search, file search, and code execution, with per-tool cost tracking |
| 96 | + |
| 97 | +**Bedrock prompt caching** |
| 98 | + |
| 99 | +You can now implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates. |
| 100 | + |
| 101 | +- Cache specific portions of your requests for repeated use |
| 102 | +- Reduce inference response latency and input token costs |
| 103 | + |
| 104 | +Read more about the implementation [here](https://portkey.ai/docs/integrations/llms/bedrock/prompt-caching) |
| 105 | + |
| 106 | +**Virtual keys for self-hosted models** |
| 107 | + |
| 108 | +You can now create a virtual key for any self-hosted model - whether you're running Ollama, vLLM, or any custom/private model. |
| 109 | + |
| 110 | +- No extra setup required |
| 111 | +- Stay in control with logs, traces, and key metrics |
| 112 | +- Manage all your LLM interactions through one interface |
| 113 | + |
| 114 | +**Advanced capabilities** |
| 115 | + |
| 116 | +- **Openrouter**: Added mapping for new parameters - modalities, reasoning, transforms, provider, models, response_format. |
| 117 | +- **Vertex AI**: Added support for explicitly mentioning mime_type for urls sent in the request. Gemini 2.5 thinking parameters are now available. |
| 118 | +- **Perplexity**: Added support for response_format and search_recency_filter request parameters. |
| 119 | +- **Bedrock**: You can now pass the `anthropic_beta` parameter in Bedrock’s Anthropic API via Portkey to enable Claude's Computer use beta feature. |
| 120 | + |
| 121 | +**Tool calling** |
| 122 | + |
| 123 | +Portkey now supports tool calling for Groq, OpenRouter, and Ollama |
| 124 | + |
| 125 | +**New Providers** |
| 126 | + |
| 127 | +<CardGroup cols={2}> |
| 128 | +<Card title="Dashscope"> |
| 129 | +Integrate with Dashscope |
| 130 | +</Card> |
| 131 | +<Card title="Recraft AI"> |
| 132 | +Generate production-ready visuals with Recraft |
| 133 | +</Card> |
| 134 | +</CardGroup> |
| 135 | +<CardGroup cols={2}> |
| 136 | +<Card title="Replicate"> |
| 137 | +Run open-source models via simple APIs with Replicate |
| 138 | +</Card> |
| 139 | +<Card title="Azure AI Foundry"> |
| 140 | +Access over 1,800 models with Azure AI Foundry |
| 141 | +</Card> |
| 142 | +</CardGroup> |
| 143 | + |
| 144 | +**Technical Improvements** |
| 145 | + |
| 146 | +- **Caching and Logging Unified Thinking Responses**: Unified thinking response (content_blocks) now logged and cached for stream responses. |
| 147 | +- **Strict Metadata Enforcement**: The metalogging preference order now is `Workspace Default > API Key Default > Incoming Request`. This is provide better control to org admins and ensure values set by them are not overridden. |
| 148 | +- **Prompt render endpoint**: Previously only available via the control plane, the prompt render endpoint is now supported directly on the Gateway URL. |
| 149 | +- Default config in an API key can no longer be overridden. |
| 150 | + |
| 151 | +## New Models & Integrations |
| 152 | + |
| 153 | +<CardGroup cols={2}> |
| 154 | +<Card title="GPT-4.1"> |
| 155 | +OpenAI’s new model for faster and improved responses |
| 156 | +</Card> |
| 157 | +<Card title="Gemini 2.5 Pro"> |
| 158 | +Google's most advanced model |
| 159 | +</Card> |
| 160 | +<Card title="Gemini 2.5 Flash"> |
| 161 | +Google's fast, coest-efficient thinking model |
| 162 | +</Card> |
| 163 | +<Card title="Llama 4"> |
| 164 | +Meta's latest model via Fireworks, Together, and Groq |
| 165 | +</Card> |
| 166 | +<Card title="o1-pro"> |
| 167 | +OpenAI's model for better reasoning and consistent answers |
| 168 | +</Card> |
| 169 | +<Card title="gpt-image-1"> |
| 170 | +OpenAI's latest image generation capabilities |
| 171 | +</Card> |
| 172 | +<Card title="Qwen 3"> |
| 173 | +Alibaba's latest model with hybrid reasoning |
| 174 | +</Card> |
| 175 | +<Card title="Audio models"> |
| 176 | +Access audio models via Groq |
| 177 | +</Card> |
| 178 | +</CardGroup> |
| 179 | + |
| 180 | +## Guardrails |
| 181 | + |
| 182 | +- **Azure AI content safety**: Use Microsoft’s content filtering solution to moderate inputs and outputs across supported models. |
| 183 | + |
| 184 | +- **Exa Online Search**: You can now configure Exa Online Search as a Guardrail in Portkey to enable real-time, grounded search across the web before answering. This makes any LLM capable of handling current events or live queries without needing model retraining. |
| 185 | + |
| 186 | +## Documentation |
| 187 | + |
| 188 | +<Card icon="book" title="Administration Docs" href="https://portkey.ai/docs/product/administration/enforcing-request-metadata" horizontal /> |
| 189 | + |
| 190 | +We've made significant improvements to our documentation: |
| 191 | + |
| 192 | +- **Virtual keys access**: Defining who can view and manage virtual keys within workspaces. [Learn more](https://portkey.ai/docs/product/administration/configure-virtual-key-access-permissions) |
| 193 | +- **API keys access**: Control how workspace managers and members interact with API keys within their workspaces. [Learn more](https://portkey.ai/docs/product/administration/configure-api-key-access-permissions) |
| 194 | + |
| 195 | + |
| 196 | +## Community |
| 197 | + |
| 198 | +Here's a tutorial on how to build a customer supporr agent using Langraph and Portkey. Shoutout to [Nerding I/O](https://www.youtube.com/@nerding_io)!! |
| 199 | + |
| 200 | +<iframe width="560" height="315" src="https://www.youtube.com/embed/6MgPd3O3FXs?si=3FOPtlbTd9ayGN-1" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> |
| 201 | + |
| 202 | +**Customer love!** |
| 203 | +| <img src="/images/changelog/testimonial.jpeg" width="100%"/> | <img src="/images/changelog/testimonial2.png" width="100%" /> | |
| 204 | +| :-- | :-- | |
| 205 | + |
| 206 | +**Partner blog** |
| 207 | + |
| 208 | +[See](https://portkey.ai/blog/securing-your-ai-via-ai-gateways/) how Portkey and Pillar together can help you build secure GenAI apps for production. |
| 209 | + |
| 210 | +### Community Contributors |
| 211 | + |
| 212 | +A special thanks to our community contributors this month: |
| 213 | +- [unsync](https://github.com/unsync) |
| 214 | +- [francescov1](https://github.com/francescov1) |
| 215 | +- [Ajay Satish](https://github.com/Ajay-Satish-01) |
| 216 | + |
| 217 | +## Coming this month! |
| 218 | + |
| 219 | +We're changing how agents go to production, from first principles. [Watch out for this](https://x.com/PortkeyAI/status/1912491547653701891) 👀 |
| 220 | + |
| 221 | +## Support |
| 222 | + |
| 223 | +<CardGroup cols={2}> |
| 224 | +<Card title="Need Help?" icon="bug" href="https://github.com/Portkey-AI/gateway/issues"> |
| 225 | +Open an issue on GitHub |
| 226 | +</Card> |
| 227 | +<Card title="Join Us" icon="discord" href="https://portkey.wiki/community"> |
| 228 | +Get support in our Discord |
| 229 | +</Card> |
| 230 | +</CardGroup> |
0 commit comments