diff --git a/product/administration/configure-virtual-key-access-permissions.mdx b/product/administration/configure-virtual-key-access-permissions.mdx index a8bb709e..702a8b44 100644 --- a/product/administration/configure-virtual-key-access-permissions.mdx +++ b/product/administration/configure-virtual-key-access-permissions.mdx @@ -1,5 +1,5 @@ --- -title: "Configure Virtual Key Access Permission for Workspaces" +title: "Configure Provider Access Permissions" --- @@ -8,39 +8,38 @@ title: "Configure Virtual Key Access Permission for Workspaces" ## Overview -Virtual Key Management in Portkey allows Organization administrators to define who can view and manage virtual keys within workspaces. This feature provides granular control over access to virtual keys, which are critical for managing connections to external AI providers and services. +Provider Management in Portkey allows Organization administrators to define who can view and manage providers within workspaces. This feature provides granular control over access to providers, which are critical for managing connections to external AI services. -## Accessing Virtual Key Management Permissions +## Accessing Provider Management Permissions 1. Navigate to **Admin Settings** in the Portkey dashboard 2. Select the **Security** tab from the left sidebar -3. Locate the **Virtual Key Management** section +3. Locate the **Provider Management** section ## Permission Settings -The Virtual Key Management section provides three distinct permission options: +The Provider Management section provides three distinct permission options: | Permission | Description | |------------|-------------| -| **View Virtual Keys (Workspace Managers)** | Enable workspace managers to view all virtual keys within their workspace. | -| **Manage Virtual Keys (Workspace Managers)** | Allow workspace managers to create, update, and delete virtual keys within their workspace. | -| **View Virtual Keys (Workspace Members)** | Enable workspace members to view virtual keys within their workspace. Note: Members cannot create, modify or delete virtual keys by default. | +| **View Providers (Workspace Managers)** | Enable workspace managers to view all providers within their workspace. | +| **Manage Providers (Workspace Managers)** | Allow workspace managers to create, update, and delete providers within their workspace. | +| **View Providers (Workspace Members)** | Enable workspace members to view providers within their workspace. Note: Members cannot create, modify or delete providers by default. | - + -## Understanding Virtual Keys +## Understanding Providers in Model Catalog -Virtual keys in Portkey securely store provider credentials and enable: +Providers in Portkey's [Model Catalog](/product/model-catalog) securely store your API credentials and enable: -- Centralized management of AI provider keys -- Abstraction of actual provider keys from end users +- Centralized management of AI provider credentials +- Abstraction of actual API keys from end users - Definition of routing rules, fallbacks, and other advanced features - Application of usage limits and tracking across providers -By controlling who can view and manage these virtual keys, organizations can maintain security while enabling appropriate access for different team roles. - +By controlling who can view and manage providers, organizations can maintain security while enabling appropriate access for different team roles. ## Related Features @@ -48,10 +47,10 @@ By controlling who can view and manage these virtual keys, organizations can mai Learn about Portkey's access control features including user roles and organization hierarchy - - Understand how virtual keys work and how to configure them effectively + + Understand how to add and manage providers in the Model Catalog - - Learn how to set budget limits on virtual keys to control spending + + Learn how to set budget limits on providers to control spending diff --git a/product/ai-gateway.mdx b/product/ai-gateway.mdx index 6ae8db7b..36299f99 100644 --- a/product/ai-gateway.mdx +++ b/product/ai-gateway.mdx @@ -51,11 +51,11 @@ description: The world's fastest AI Gateway with advanced routing & integrated G Easily handle unresponsive LLM requests - + Set usage limits based on costs incurred or tokens used - + Set hourly, daily, or per minute rate limits on requests or tokens sent diff --git a/product/ai-gateway/automatic-retries.mdx b/product/ai-gateway/automatic-retries.mdx index e6e1a044..bf7b9bd1 100644 --- a/product/ai-gateway/automatic-retries.mdx +++ b/product/ai-gateway/automatic-retries.mdx @@ -1,160 +1,103 @@ --- title: "Automatic Retries" -description: "LLM APIs often have inexplicable failures. With Portkey, you can rescue a substantial number of your requests with our in-built automatic retries feature. " +description: Automatically retry failed LLM requests with exponential backoff. --- - This feature is available on all Portkey [plans](https://portkey.ai/pricing). +Available on all Portkey [plans](https://portkey.ai/pricing). -* Automatic retries are triggered **up to 5 times** -* Retries can also be triggered only on **specific error codes** -* Each subsequent retry attempt follows **exponential backoff strategy** to prevent network overload -* Optionally respects provider's `Retry-After` headers for rate-limited requests +- Up to **5 retry attempts** +- Trigger on **specific error codes** +- **Exponential backoff** to prevent overload +- Optionally respect provider's `Retry-After` headers -## Enabling Retries +## Examples -To enable retry, just add the `retry` param to your [config object](/api-reference/config-object). + -### Retry with 5 attempts - -```JSON +```json Basic (5 attempts) { - "retry": { - "attempts": 5 - }, - "provider":"@virtual-key-xxx" + "retry": { "attempts": 5 }, + "override_params": { "model": "@openai-prod/gpt-4o" } } ``` -### Retry only on specific error codes - -By default, Portkey triggers retries on the following error codes: **\[429, 500, 502, 503, 504\]** - -You can change this behaviour by setting the optional `on_status_codes` param in your retry config and manually inputting the error codes on which retry will be triggered. - - -```JSON +```json Specific Error Codes { - "retry": { - "attempts": 3, - "on_status_codes": [ 408, 429, 401 ] - }, - "provider":"@virtual-key-xxx" + "retry": { "attempts": 3, "on_status_codes": [429, 503] }, + "override_params": { "model": "@openai-prod/gpt-4o" } } ``` - - If the `on_status_codes` param is present, retries will be triggered **only** on the error codes specified in that Config and not on Portkey's default error codes for retries (i.e. \[429, 500, 502, 503, 504\]) - - -### Respecting provider's retry headers - -Portkey can respect the provider's `retry-after-ms`, `x-ms-retry-after-ms` and `retry-after`response headers when encountering rate limits. This enables more intelligent retry timing based on the provider's response headers rather than using the default exponential backoff strategy. - -To enable this feature, add the `use_retry_after_headers` parameter to your retry config. By default this behaviour is disabled, and `use_retry_after_headers` is set to `false`. - -```JSON +```json Respect Retry-After Headers { - "retry": { - "attempts": 3, - "on_status_codes": [ 429 ], - "use_retry_after_headers": true - }, - "provider":"@virtual-key-xxx" + "retry": { "attempts": 3, "on_status_codes": [429], "use_retry_after_headers": true }, + "override_params": { "model": "@openai-prod/gpt-4o" } } ``` - -When `use_retry_after_headers` is set to `true` and the provider includes `Retry-After` or `Retry-After-ms` headers in their response, Portkey will use these values to determine the wait time before the next retry attempt, overriding the exponential backoff strategy. - -If the provider doesn't include these headers in the response, Portkey will fall back to the standard exponential backoff strategy. - -The cumulative retry wait time for a single request is capped at 60 seconds. For example, if the first retry has a wait time of 20 seconds, and the second retry response includes a Retry-After value of 50 seconds, the request will fail since the total wait time (20+50=70) exceeds the 60-second cap. Similarly, if any single Retry-After value exceeds 60 seconds, the request will fail immediately. - - - -### Exponential backoff strategy - -When not using provider retry headers (or when they're not available), Portkey triggers retries following this exponential backoff strategy: - -| Attempt | Time out between requests | -| ----------------- | ------------------------- | -| Initial Call | Immediately | -| Retry 1st attempt | 1 second | -| Retry 2nd attempt | 2 seconds | -| Retry 3rd attempt | 4 seconds | -| Retry 4th attempt | 8 seconds | -| Retry 5th attempt | 16 seconds | - - - + - This feature is available on all Portkey [plans](https://portkey.ai/pricing). +The `@provider-slug/model-name` format automatically routes to the correct provider. Set up providers in [Model Catalog](https://app.portkey.ai/model-catalog). -* Automatic retries are triggered **up to 5 times** -* Retries can also be triggered only on **specific error codes** -* And each subsequent retry attempt follows **exponential backoff strategy** to prevent network overload - -## Enabling Retries +## Retry on Specific Error Codes -To enable retry, just add the `retry` param to your [config object](/api-reference/config-object). +Default retry codes: **[429, 500, 502, 503, 504]** -### Retry with 5 attempts +Override with `on_status_codes`: -```JSON +```json { - "retry": { - "attempts": 5 - }, - "provider":"@virtual-key-xxx" + "retry": { "attempts": 3, "on_status_codes": [408, 429, 500] }, + "override_params": { "model": "@openai-prod/gpt-4o" } } ``` -### Retry only on specific error codes - -By default, Portkey triggers retries on the following error codes: **\[429, 500, 502, 503, 504\]** + +When `on_status_codes` is set, retries trigger **only** on those codes—not the defaults. + -You can change this behaviour by setting the optional `on_status_codes` param in your retry config and manually inputting the error codes on which rety will be triggered. +## Respect Provider Retry Headers +Enable `use_retry_after_headers` to use the provider's `retry-after-ms`, `x-ms-retry-after-ms`, or `retry-after` headers instead of exponential backoff. -```JSON +```json { - "retry": { - "attempts": 3, - "on_status_codes": [ 408, 429, 401 ] - }, - "provider":"@virtual-key-xxx" + "retry": { "attempts": 3, "on_status_codes": [429], "use_retry_after_headers": true }, + "override_params": { "model": "@openai-prod/gpt-4o" } } ``` - If the `on_status_codes` param is present, retries will be triggered **only** on the error codes specified in that Config and not on Portkey's default error codes for retries (i.e. \[429, 500, 502, 503, 504\]) +- Falls back to exponential backoff if headers aren't present +- Cumulative retry wait capped at **60 seconds** +- Single `Retry-After` value > 60s fails immediately -### Exponential backoff strategy - -Here's how Portkey triggers retries following exponential backoff: +## Exponential Backoff -| Attempt | Time out between requests | -| ----------------- | ------------------------- | -| Initial Call | Immediately | -| Retry 1st attempt | 1 second | -| Retry 2nd attempt | 2 seconds | -| Retry 3rd attempt | 4 seconds | -| Retry 4th attempt | 8 seconds | -| Retry 5th attempt | 16 seconds | +| Attempt | Wait Time | +|---------|-----------| +| Initial | Immediate | +| 1st retry | 1 second | +| 2nd retry | 2 seconds | +| 3rd retry | 4 seconds | +| 4th retry | 8 seconds | +| 5th retry | 16 seconds | -### Understanding the Retry Attempt Header +## Retry Attempt Header -In the response from Portkey, you can find the `x-portkey-retry-attempt-count` header which provides information about retry attempts: +Check `x-portkey-retry-attempt-count` in responses: -- If the value is `-1`: This means that Portkey exhausted all the retry attempts and the request was unsuccessful -- If the value is `0`: This means that there were no retries configured -- If the value is `>0`: This means that Portkey attempted retries and this is the exact number at which the request was successful +| Value | Meaning | +|-------|---------| +| `-1` | All retries exhausted, request failed | +| `0` | No retries configured | +| `>0` | Successful on this retry attempt | -Currently, Portkey does not log all the retry attempts individually in the logs dashboard. Instead, the response times from all retry attempts are summed up in the single log entry. +Retry attempts aren't logged individually. Response times are summed in a single log entry. diff --git a/product/ai-gateway/batches.mdx b/product/ai-gateway/batches.mdx index 245a9210..1868769c 100644 --- a/product/ai-gateway/batches.mdx +++ b/product/ai-gateway/batches.mdx @@ -219,7 +219,7 @@ Portkey Files are files uploaded to Portkey that are then automatically uploaded | ---------------------------------- | --------------------------------------------------------------------------------------------------------- | | **Batch Job** | A collection of completion requests executed asynchronously. | | **Portkey File** (`input_file_id`) | Files uploaded to Portkey that are automatically uploaded to the provider for batch processing. Useful for reusing the same file across multiple batch completions. | -| **Virtual Key** | A logical provider credential stored in Portkey; referenced by ID, not secret. | +| **Provider Slug** | A unique identifier for your AI provider (e.g., `@openai-prod`). Set up in [Model Catalog](https://app.portkey.ai/model-catalog). | | **Completion Window** | Time frame in which the job must finish. `immediate` → handled by Portkey; `24h` → delegated to provider. | --- diff --git a/product/ai-gateway/cache-simple-and-semantic.mdx b/product/ai-gateway/cache-simple-and-semantic.mdx index 2fbd0498..456f4399 100644 --- a/product/ai-gateway/cache-simple-and-semantic.mdx +++ b/product/ai-gateway/cache-simple-and-semantic.mdx @@ -1,359 +1,220 @@ --- title: "Cache (Simple & Semantic)" +description: Speed up requests and reduce costs by caching LLM responses. --- -**Simple** caching is available for all plans.
-**Semantic** caching requires a vector database and is only available on select Enterprise plans. [Contact us](https://portkey.ai/docs/support/contact-us) to learn more about enabling this feature. +**Simple** caching available on all plans. **Semantic** caching on [Production](https://portkey.ai/pricing) and [Enterprise](https://portkey.ai/docs/product/enterprise-offering).
-Speed up and save money on your LLM requests by storing past responses in the Portkey cache. There are 2 cache modes: +Cache LLM responses to serve requests up to **20x faster** and cheaper. -* **Simple:** Matches requests verbatim. Perfect for repeated, identical prompts. Works on **all models** including image generation models. -* **Semantic:** Matches responses for requests that are semantically similar. Ideal for denoising requests with extra prepositions, pronouns, etc. Works on any model available on `/chat/completions`or `/completions` routes. +| Mode | How it Works | Best For | Supported Routes | +|------|--------------|----------|------------------| +| **Simple** | Exact match on input | Repeated identical prompts | All models including image generation | +| **Semantic** | Matches semantically similar requests | Denoising variations in phrasing | `/chat/completions`, `/completions` | -Portkey cache serves requests upto **20x times faster** and **cheaper**. +## Enable Cache -## Enable Cache in the Config +Add `cache` to your [config object](/api-reference/config-object#cache-object-details): -To enable Portkey cache, just add the `cache` params to your [config object](/api-reference/config-object#cache-object-details). + - - Caching will not work if the `x-portkey-debug: "false"` header is included in the request - - -## Simple Cache - - -```sh -"cache": { "mode": "simple" } +```json Simple Cache +{ "cache": { "mode": "simple" } } ``` -### How it Works - -Simple cache performs an exact match on the input prompts. If the exact same request is received again, Portkey retrieves the response directly from the cache, bypassing the model execution. - ---- - -## Semantic Cache - +```json Semantic Cache +{ "cache": { "mode": "semantic" } } +``` -```sh -"cache": { "mode": "semantic" } +```json With TTL (60 seconds) +{ "cache": { "mode": "semantic", "max_age": 60 } } ``` + + -Semantic caching requires a vector database and is only available on select Enterprise plans. [Contact us](https://portkey.ai/docs/support/contact-us) to learn more about enabling this feature. +Caching won't work if `x-portkey-debug: "false"` header is included. -### How it Works +## Simple Cache + +Exact match on input prompts. If the same request comes again, Portkey returns the cached response. -Semantic cache considers the contextual similarity between input requests. It uses cosine similarity to ascertain if the similarity between the input and a cached request exceeds a specific threshold. If the similarity threshold is met, Portkey retrieves the response from the cache, saving model execution time. Check out this [blog](https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/) for more details on how we do this. +## Semantic Cache + +Matches requests with similar meaning using cosine similarity. [Learn more →](https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/) - Semantic cache is a "superset" of both caches. Setting cache mode to "semantic" will work for when there are simple cache hits as well. +Semantic cache is a superset—it handles simple cache hits too. - To optimise for accurate cache hit rates, Semantic cache only works with requests with less than 8,191 input tokens, and with number of messages (human, assistant, system combined) less than or equal to 4. +Semantic cache works with requests under 8,191 tokens and ≤4 messages. -### Ignoring the First Message in Semantic Cache - -When using the `/chat/completions` endpoint, Portkey requires at least **two** message objects in the `messages` array. The first message object, typically used for the `system` message, is not considered when determining semantic similarity for caching purposes. +### System Message Ignored -For example: +Semantic cache requires **at least two messages**. The first message (typically `system`) is ignored for matching: - -```JSON -messages = [ - { "role": "system", "content": "You are a helpful assistant" }, - { "role": "user", "content": "Who is the president of the US?" } +```json +[ + { "role": "system", "content": "You are a helpful assistant" }, + { "role": "user", "content": "Who is the president of the US?" } ] ``` -In this case, only the content of the `user` message ("Who is the president of the US?") is used for finding semantic matches in the cache. The `system` message ("You are a helpful assistant") is ignored. - -This means that even if you change the `system` message while keeping the `user` message semantically similar, Portkey will still return a semantic cache hit. - -This allows you to modify the behavior or context of the assistant without affecting the cache hits for similar user queries. - -### [Read more how to set cache in Configs](/product/ai-gateway/cache-simple-and-semantic#how-cache-works-with-configs). - ---- +Only the `user` message is used for matching. Change the system message without affecting cache hits. -## Setting Cache Age +## Cache TTL -You can set the age (or "ttl") of your cached response with this setting. Cache age is also set in your Config object: +Set expiration with `max_age` (in seconds): ```json -"cache": { - "mode": "semantic", - "max_age": 60 -} +{ "cache": { "mode": "semantic", "max_age": 60 } } ``` -In this example, your cache will automatically expire after 60 seconds. Cache age is set in **seconds**. +| Setting | Value | +|---------|-------| +| Minimum | 60 seconds | +| Maximum | 90 days (7,776,000 seconds) | +| Default | 7 days (604,800 seconds) | - -* **Minimum** cache age is **60 seconds** -* **Maximum** cache age is **90 days** (i.e. **7776000** seconds) -* **Default** cache age is **7 days** (i.e. **604800** seconds) +### Organization-Level TTL - ---- +Admins can set default TTL for all workspaces to align with data retention policies: +1. Go to **Admin Settings** → **Organization Properties** → **Cache Settings** +2. Enter default TTL (seconds) +3. Save -## Organization-Level Cache TTL Settings +**Precedence:** +- No `max_age` in request → org default used +- Request `max_age` > org default → org default wins +- Request `max_age` < org default → request value honored -Organization administrators can now define the default cache TTL (Time to Live) for all API keys and workspaces within the organization. This provides centralized control over cache expiration to align with data retention policies. +Max org-level TTL: 25,923,000 seconds. -**How to Configure Organization Cache TTL** -1. Navigate to Admin Settings -2. Select Organization Properties -3. Scroll to Cache Settings -4. Enter your desired default cache TTL value (in seconds) -5. Click Save +## Force Refresh -### How Organization-Level Cache TTL Works +Fetch a fresh response even when a cached response exists. This is set **per-request** (not in Config): -1. **Default Value**: The value set at the organization level serves as both the maximum and default TTL value. -2. **Precedence Rules**: - - If no `max_age` is specified in the request, the organization-level default value is used. - - If the `max_age` value in a request is greater than the organization-level default, the organization-level value takes precedence. - - If the `max_age` in a request is less than the organization-level default, the lower request value is honored. -3. The max value of Organisation Level Cache TTL is 25923000 seconds. - -This feature helps organizations implement consistent cache retention policies while still allowing individual requests to use shorter TTL values when needed. - + +```python Python +response = portkey.with_options( + cache_force_refresh=True +).chat.completions.create( + messages=[{"role": "user", "content": "Hello!"}], + model="@openai-prod/gpt-4o" +) +``` - -## Force Refresh Cache - -Ensure that a new response is fetched and stored in the cache even when there is an existing cached response for your request. Cache force refresh can only be done **at the time of making a request**, and it is **not a part of your Config**. - -You can enable cache force refresh with this header: - -```sh -"x-portkey-cache-force-refresh": "True" +```javascript Node +const response = await portkey.chat.completions.create({ + messages: [{ role: 'user', content: 'Hello' }], + model: '@openai-prod/gpt-4o', +}, { + cacheForceRefresh: true +}); ``` - - -```sh +```bash cURL curl https://api.portkey.ai/v1/chat/completions \ - -H "Content-Type: application/json" \ -H "x-portkey-api-key: $PORTKEY_API_KEY" \ - -H "x-portkey-provider: open-ai-xxx" \ - -H "x-portkey-config: cache-config-xxx" \ + -H "x-portkey-config: pc-cache-xxx" \ -H "x-portkey-cache-force-refresh: true" \ - -d '{ - "messages": [{"role": "user","content": "Hello!"}] - }' -``` - - - -```py -from portkey_ai import Portkey - -portkey = Portkey( - api_key="PORTKEY_API_KEY", - provider="@open-ai-xxx", - config="pp-cache-xxx" -) - -response = portkey.with_options( - cache_force_refresh = True -).chat.completions.create( - messages = [{ "role": 'user', "content": 'Hello!' }], - model = 'gpt-4' -) -``` - - - -```JS -import Portkey from 'portkey-ai'; - -const portkey = new Portkey({ - apiKey: "PORTKEY_API_KEY", - config: "pc-cache-xxx", - provider:"@open-ai-xxx" -}) - -async function main(){ - const response = await portkey.chat.completions.create({ - messages: [{ role: 'user', content: 'Hello' }], - model: 'gpt-4', - }, { - cacheForceRefresh: true - }); -} - -main() + -d '{"model": "@openai-prod/gpt-4o", "messages": [{"role": "user","content": "Hello!"}]}' ``` - - + - -* Cache force refresh is only activated if a cache config is **also passed** along with your request. (setting `cacheForceRefresh` as `true` without passing the relevant cache config will not have any effect) -* For requests that have previous semantic hits, force refresh is performed on ALL the semantic matches of your request. +- Requires cache config to be passed +- For semantic hits, refreshes ALL matching entries ---- - -## Cache Namespace: Simplified Cache Partitioning - -Portkey generally partitions the cache along all the values passed in your request header. With a custom cache namespace, you can now ignore metadata and other headers, and only partition the cache based on the custom strings that you send. - -This allows you to have finer control over your cached data and optimize your cache hit ratio. - -### How It Works - -To use Cache Namespaces, simply include the `x-portkey-cache-namespace` header in your API requests, followed by any custom string value. Portkey will then use this namespace string as the sole basis for partitioning the cache, disregarding all other headers, including metadata. - -For example, if you send the following header: - -```sh -"x-portkey-cache-namespace: user-123" -``` - -Portkey will cache the response under the namespace `user-123`, ignoring any other headers or metadata associated with the request. - - - +## Cache Namespace -```JS -import Portkey from 'portkey-ai'; +By default, Portkey partitions cache by all request headers. Use a custom namespace to partition only by your custom string—useful for per-user caching or optimizing hit ratio: -const portkey = new Portkey({ - apiKey: "PORTKEY_API_KEY", - config: "pc-cache-xxx", - provider:"@open-ai-xxx" -}) - -async function main(){ - const response = await portkey.chat.completions.create({ - messages: [{ role: 'user', content: 'Hello' }], - model: 'gpt-4', - }, { - cacheNamespace: 'user-123' - }); -} - -main() -``` - - - -```Python -from portkey_ai import Portkey - -portkey = Portkey( - api_key="PORTKEY_API_KEY", - provider="@open-ai-xxx", - config="pp-cache-xxx" -) + +```python Python response = portkey.with_options( - cache_namespace = "user-123" + cache_namespace="user-123" ).chat.completions.create( - messages = [{ "role": 'user', "content": 'Hello!' }], - model = 'gpt-4' + messages=[{"role": "user", "content": "Hello!"}], + model="@openai-prod/gpt-4o" ) ``` - - -```sh +```javascript Node +const response = await portkey.chat.completions.create({ + messages: [{ role: 'user', content: 'Hello' }], + model: '@openai-prod/gpt-4o', +}, { + cacheNamespace: 'user-123' +}); +``` + +```bash cURL curl https://api.portkey.ai/v1/chat/completions \ - -H "Content-Type: application/json" \ -H "x-portkey-api-key: $PORTKEY_API_KEY" \ - -H "x-portkey-provider: open-ai-xxx" \ - -H "x-portkey-config: cache-config-xxx" \ + -H "x-portkey-config: pc-cache-xxx" \ -H "x-portkey-cache-namespace: user-123" \ - -d '{ - "messages": [{"role": "user","content": "Hello!"}] - }' + -d '{"model": "@openai-prod/gpt-4o", "messages": [{"role": "user","content": "Hello!"}]}' ``` - - -In this example, the response will be cached under the namespace `user-123`, ignoring any other headers or metadata. - ---- - -## Cache in Analytics -Portkey shows you powerful stats on cache usage on the Analytics page. Just head over to the Cache tab, and you will see: + -* Your raw number of cache hits as well as daily cache hit rate -* Your average latency for delivering results from cache and how much time it saves you -* How much money the cache saves you +## Cache with Configs -## Cache in Logs +Set cache at top-level or per-target: -On the Logs page, the cache status is updated on the Status column. You will see `Cache Disabled` when you are not using the cache, and any of `Cache Miss`, `Cache Refreshed`, `Cache Hit`, `Cache Semantic Hit` based on the cache hit status. Read more [here](/product/observability/logs). + - - - -For each request we also calculate and show the cache response time and how much money you saved with each hit. - ---- - -## How Cache works with Configs - -You can set cache at two levels: - -* **Top-level** that works across all the targets. -* **Target-level** that works when that specific target is triggered. - - - - - - -```json +```json Top-Level (all targets) { - "cache": {"mode": "semantic", "max_age": 60}, - "strategy": {"mode": "fallback"}, + "cache": { "mode": "semantic", "max_age": 60 }, + "strategy": { "mode": "fallback" }, "targets": [ - {"provider":"@openai-key-1"}, - {"provider":"@openai-key-2"} + { "override_params": { "model": "@openai-prod/gpt-4o" } }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" } } ] } ``` - - -```json +```json Per-Target { - "strategy": {"mode": "fallback"}, + "strategy": { "mode": "fallback" }, "targets": [ - { - "provider":"@openai-key-1", - "cache": {"mode": "simple", "max_age": 200} - }, - { - "provider":"@openai-key-2", - "cache": {"mode": "semantic", "max_age": 100} - } + { "override_params": { "model": "@openai-prod/gpt-4o" }, "cache": { "mode": "simple", "max_age": 200 } }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "cache": { "mode": "semantic", "max_age": 100 } } ] } ``` - - - - - You can also set cache at **both levels (top & target).** -In this case, the **target-level cache** setting will be **given preference** over the **top-level cache** setting. You should start getting cache hits from the second request onwards for that specific target. + + +Target-level cache takes precedence over top-level. + - If any of your targets have `override_params` then cache on that target will not work until that particular combination of params is also stored with the cache. If there are **no** `override_params`for that target, then **cache will be active** on that target even if it hasn't been triggered even once. +Targets with `override_params` need that exact param combination cached before hits occur. + +## Analytics & Logs + +**Analytics** → Cache tab shows: +- Cache hit rate +- Latency savings +- Cost savings + +**Logs** → Status column shows: `Cache Hit`, `Cache Semantic Hit`, `Cache Miss`, `Cache Refreshed`, or `Cache Disabled`. [Learn more →](/product/observability/logs) + + + + diff --git a/product/ai-gateway/circuit-breaker.mdx b/product/ai-gateway/circuit-breaker.mdx index 740ed5f2..3f9a80ca 100644 --- a/product/ai-gateway/circuit-breaker.mdx +++ b/product/ai-gateway/circuit-breaker.mdx @@ -1,32 +1,34 @@ --- -title: 'Circuit Breaker' -description: 'Configure per-strategy circuit protection and failure handling' +title: "Circuit Breaker" +description: Automatically stop routing to unhealthy targets until they recover. --- -This feature is available on all Portkey [plans](https://portkey.ai/pricing). +Available on all Portkey [plans](https://portkey.ai/pricing). -## Circuit Breaker Config Schema +Circuit breakers prevent cascading failures by temporarily blocking requests to targets that are failing. -Each `strategy` in a config may define a `cb_config` with the following fields: +## Config Schema -- **failure_threshold**: Number of failures after which the circuit opens. -- **failure_threshold_percentage** *(optional)*: Percentage failure rate to trip the circuit. -- **cooldown_interval**: Time (in milliseconds) to wait before allowing retries. A minimum of 30 seconds is enforced. -- **failure_status_codes** *(optional)*: Specific HTTP status codes considered as failures. If not provided, all status codes >500 are considered as failures. -- **minimum_requests** *(optional)*: Minimum number of requests required before failure rate is evaluated. +| Field | Description | +|-------|-------------| +| `failure_threshold` | Number of failures to open circuit | +| `failure_threshold_percentage` | Percentage failure rate to trip circuit *(optional)* | +| `cooldown_interval` | Milliseconds to wait before retrying (min: 30s) | +| `failure_status_codes` | HTTP codes considered failures *(optional, default: >500)* | +| `minimum_requests` | Requests required before evaluating failure rate *(optional)* | -- Strategies can inherit `cb_config` from a parent if not set at their level. -- Targets inherit `cb_config` from their parent strategy. +- Strategies inherit `cb_config` from parent if not set +- Targets inherit from their parent strategy -Strategies using the `conditional` mode are **not considered** in circuit breaker logic. +`conditional` mode strategies are **not** evaluated by circuit breaker. -## Example Config +## Example ```json { @@ -36,55 +38,45 @@ Strategies using the `conditional` mode are **not considered** in circuit breake "failure_threshold_percentage": 20, "minimum_requests": 10, "cooldown_interval": 60000, - "failure_status_codes": [ - 401, - 429, - 500 - ] + "failure_status_codes": [401, 429, 500] } }, "targets": [ - { - "virtual_key": "virtual-key-1" - }, - { - "virtual_key": "virtual-key-2" - } + { "override_params": { "model": "@openai-prod/gpt-4o" } }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" } } ] } ``` -## Circuit State Evaluation - -Circuit breaker logic is evaluated per strategy path. It tracks: + +The `@provider-slug/model-name` format automatically routes to the correct provider. Set up providers in [Model Catalog](https://app.portkey.ai/model-catalog). + -- Number of failures and successes. -- The time of the first failure. -- Whether the total requests meet the `minimum_requests` threshold and calculates failure rate as percentage +## How It Works -The circuit is **OPEN**(unhealthy) if: -- The failure count exceeds the `failure_threshold`, or -- The failure rate exceeds `failure_threshold_percentage`. +Circuit breaker tracks per strategy path: +- Failure and success counts +- Time of first failure +- Failure rate (when `minimum_requests` threshold met) -Once **OPEN**, requests to the affected targets are blocked until the `cooldown_interval` has passed. +**Circuit opens (OPEN)** when: +- Failure count exceeds `failure_threshold`, or +- Failure rate exceeds `failure_threshold_percentage` -When the cooldown period ends, the circuit is **CLOSED** automatically, and failure counters are reset. +**Circuit closes (CLOSED)** automatically after `cooldown_interval` passes. ## Runtime Behavior -- At each strategy level, the circuit breaker evaluates the status of all sibling targets. -- Unhealthy targets (circuit OPEN) are removed from the target list before strategy execution. -- If no healthy targets remain, circuit breaker is bypassed and all targets are used for routing. - ```mermaid flowchart TD - A[User sends request] --> B[Gateway selects a strategy] - B --> C[Strategy has multiple targets] - C --> D{All targets OPEN?} - D -->|Yes| E[Ignore circuit breakers] - E --> F[Use all targets for routing] - D -->|No| G["Use only CLOSED (healthy) targets"] - F --> H[Route to best target] - G --> H - H --> I[Process target] -``` \ No newline at end of file + A[Request arrives] --> B[Evaluate strategy targets] + B --> C{All targets OPEN?} + C -->|Yes| D[Bypass circuit breaker] + D --> E[Use all targets] + C -->|No| F[Use healthy targets only] + E --> G[Route request] + F --> G +``` + +- Unhealthy targets removed from routing +- If all targets are OPEN, circuit breaker is bypassed diff --git a/product/ai-gateway/fallbacks.mdx b/product/ai-gateway/fallbacks.mdx index d07fc557..5b922ec1 100644 --- a/product/ai-gateway/fallbacks.mdx +++ b/product/ai-gateway/fallbacks.mdx @@ -1,90 +1,100 @@ --- title: "Fallbacks" +description: Automatically switch to backup LLMs when the primary fails. --- + -This feature is available on all Portkey [plans](https://portkey.ai/pricing). +Available on all Portkey [plans](https://portkey.ai/pricing). -With an array of Language Model APIs available on the market, each with its own strengths and specialties, wouldn't it be great if you could seamlessly switch between them based on their performance or availability? Portkey's Fallback capability is designed to do exactly that. The Fallback feature allows you to specify a list of providers/models in a prioritized order. If the primary LLM fails to respond or encounters an error, Portkey will automatically fallback to the next LLM in the list, ensuring your application's robustness and reliability. + +Specify a prioritized list of providers/models. If the primary LLM fails, Portkey automatically falls back to the next in line. -## Enabling Fallback on LLMs +## Examples + + + +```json Between Models +{ + "strategy": { "mode": "fallback" }, + "targets": [ + { "override_params": { "model": "@openai-prod/gpt-4o" } }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" } } + ] +} +``` -To enable fallbacks, you can modify the [config object](/api-reference/config-object) to include the `fallback` mode. +```json Between Providers (model from request) +{ + "strategy": { "mode": "fallback" }, + "targets": [ + { "provider": "@openai-prod" }, + { "provider": "@azure-prod" } + ] +} +``` -Here's a quick example of a config to **fallback** to Anthropic's `claude-3.5-sonnet` if OpenAI's `gpt-4o` fails. +```json On Rate Limit Only (429) +{ + "strategy": { "mode": "fallback", "on_status_codes": [429] }, + "targets": [ + { "provider": "@openai-prod" }, + { "provider": "@azure-prod" } + ] +} +``` -```JSON +```json Multi-tier Fallback { - "strategy": { - "mode": "fallback" - }, + "strategy": { "mode": "fallback" }, "targets": [ - { - "provider":"@openai-virtual-key", - "override_params": { - "model": "gpt-4o" - } - }, - { - "provider":"@anthropic-virtual-key", - "override_params": { - "model": "claude-3.5-sonnet-20240620" - } - } + { "override_params": { "model": "@openai-prod/gpt-4o" } }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" } }, + { "override_params": { "model": "@google-prod/gemini-1.5-pro" } } ] } ``` -In this scenario, if the OpenAI model encounters an error or fails to respond, Portkey will automatically retry the request with Anthropic. + -[Using Configs in your Requests](/product/ai-gateway/configs#using-configs) + +The `@provider-slug/model-name` format automatically routes to the correct provider. Set up providers in [Model Catalog](https://app.portkey.ai/model-catalog). + -## Triggering fallback on specific error codes +[Create](/product/ai-gateway/configs#creating-configs) and [use](/product/ai-gateway/configs#using-configs) configs in your requests. -By default, fallback is triggered on any request that returns a **non-2xx** status code. +## Trigger on Specific Status Codes -You can change this behaviour by setting the optional `on_status_codes` param in your fallback config and manually inputting the status codes on which fallback will be triggered. +By default, fallback triggers on any **non-2xx** status code. +Customize with `on_status_codes`: -```sh +```json { - "strategy": { - "mode": "fallback", - "on_status_codes": [ 429 ] - }, + "strategy": { "mode": "fallback", "on_status_codes": [429, 503] }, "targets": [ - { - "provider":"@openai-virtual-key" - }, - { - "provider":"@azure-openai-virtual-key" - } + { "provider": "@openai-prod" }, + { "provider": "@azure-prod" } ] } ``` -Here, fallback from OpenAI to Azure OpenAI will only be triggered when there is a `429` error code from the OpenAI request (i.e. rate limiting error) +## Tracing Fallback Requests -## Tracing Fallback Requests on Portkey +Portkey logs all requests in a fallback chain. To trace: -Portkey logs all the requests that are sent as a part of your fallback config. This allows you to easily trace and see which targets failed and see which ones were eventually successful. - -To see your fallback trace, - -1. On the Logs page, first filter the logs with the specific `Config ID` where you've setup the fallback - this will show all the requests that have been sent with that config. -2. Now, trace an individual request and all the failed + successful logs for it by filtering further on `Trace ID` \- this will show all the logs originating from a single request. +1. Filter logs by `Config ID` to see all requests using that config +2. Filter by `Trace ID` to see all attempts for a single request -## Caveats and Considerations - -While the Fallback on LLMs feature greatly enhances the reliability and resilience of your application, there are a few things to consider: +## Considerations -1. Ensure the LLMs in your fallback list are compatible with your use case. Not all LLMs offer the same capabilities. -2. Keep an eye on your usage with each LLM. Depending on your fallback list, a single request could result in multiple LLM invocations. -3. Understand that each LLM has its own latency and pricing. Falling back to a different LLM could have implications on the cost and response time. +- Ensure fallback LLMs are compatible with your use case +- A single request may invoke multiple LLMs +- Each LLM has different latency and pricing diff --git a/product/ai-gateway/load-balancing.mdx b/product/ai-gateway/load-balancing.mdx index 41988eb0..c1f954da 100644 --- a/product/ai-gateway/load-balancing.mdx +++ b/product/ai-gateway/load-balancing.mdx @@ -1,114 +1,98 @@ --- title: "Load Balancing" -description: Load Balancing feature efficiently distributes network traffic across multiple LLMs. +description: Distribute traffic across multiple LLMs for high availability and optimal performance. --- - This feature is available on all Portkey [plans](https://portkey.ai/pricing). +Available on all Portkey [plans](https://portkey.ai/pricing). - This ensures high availability and optimal performance of your generative AI apps, preventing any single LLM from becoming a performance bottleneck. +Distribute traffic across multiple LLMs to prevent any single provider from becoming a bottleneck. -## Enable Load Balancing +## Examples -To enable Load Balancing, you can modify the `config` object to include a `strategy` with `loadbalance` mode. + -Here's a quick example to **load balance 75-25** between an OpenAI and an Azure OpenAI account - -```JSON +```json Between Providers (model from request) { - "strategy": { - "mode": "loadbalance" - }, + "strategy": { "mode": "loadbalance" }, "targets": [ - { - "provider":"@openai-virtual-key", - "weight": 0.75 - }, - { - "provider":"@azure-virtual-key", - "weight": 0.25 - } + { "provider": "@openai-prod", "weight": 0.7 }, + { "provider": "@azure-prod", "weight": 0.3 } ] } ``` -### You can [create](/product/ai-gateway/configs#creating-configs) and then [use](/product/ai-gateway/configs#using-configs) the config in your requests. - -## How Load Balancing Works - -1. **Defining the Loadbalance Targets & their Weights**: You provide a list of `providers`, and assign a `weight` value to each target. The weights represent the relative share of requests that should be routed to each target. -2. **Weight Normalization**: Portkey first sums up all the weights you provided for the targets. It then divides each target's weight by the total sum to calculate the normalized weight for that target. This ensures the weights add up to 1 (or 100%), allowing Portkey to distribute the load proportionally. -For example, let's say you have three targets with weights 5, 3, and 1\. The total sum of weights is 9 (5 + 3 + 1). Portkey will then normalize the weights as follows: - * Target 1: 5 / 9 = 0.55 (55% of the traffic) - * Target 2: 3 / 9 = 0.33 (33% of the traffic) - * Target 3: 1 / 9 = 0.11 (11% of the traffic) -3. **Request Distribution**: When a request comes in, Portkey routes it to a target LLM based on the normalized weight probabilities. This ensures the traffic is distributed across the LLMs according to the specified weights. - - -* Default`weight`value is`1` -* Minimum`weight`value is`0` -* If `weight` is not set for a target, the default `weight` value (i.e. `1`) is applied. -* You can set `"weight":0` for a specific target to stop routing traffic to it without removing it from your Config - - -## Sticky Load Balancing - -Sticky load balancing ensures that requests with the same identifier are consistently routed to the same target. This is useful for: - -- Maintaining conversation context across multiple requests -- Ensuring consistent model behavior for A/B testing -- Session-based routing for user-specific experiences +```json Between Models +{ + "strategy": { "mode": "loadbalance" }, + "targets": [ + { "override_params": { "model": "@openai-prod/gpt-4o" }, "weight": 0.75 }, + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "weight": 0.25 } + ] +} +``` -### Configuration +```json Multiple API Keys (same provider) +{ + "strategy": { "mode": "loadbalance" }, + "targets": [ + { "provider": "@openai-key-1", "weight": 1 }, + { "provider": "@openai-key-2", "weight": 1 }, + { "provider": "@openai-key-3", "weight": 1 } + ] +} +``` -Add `sticky_session` to your load balancing strategy: +```json Cost Optimization (cheap vs premium) +{ + "strategy": { "mode": "loadbalance" }, + "targets": [ + { "override_params": { "model": "@openai-prod/gpt-4o-mini" }, "weight": 0.8 }, + { "override_params": { "model": "@openai-prod/gpt-4o" }, "weight": 0.2 } + ] +} +``` -```json +```json Gradual Migration (old to new model) { - "strategy": { - "mode": "loadbalance", - "sticky_session": { - "hash_fields": ["metadata.user_id"], - "ttl": 3600 - } - }, + "strategy": { "mode": "loadbalance" }, "targets": [ - { - "provider": "@openai-virtual-key", - "weight": 0.5 - }, - { - "provider": "@anthropic-virtual-key", - "weight": 0.5 - } + { "override_params": { "model": "@anthropic-prod/claude-3-5-sonnet-20241022" }, "weight": 0.9 }, + { "override_params": { "model": "@anthropic-prod/claude-sonnet-4-20250514" }, "weight": 0.1 } ] } ``` -### Parameters + -| Parameter | Type | Description | -|-----------|------|-------------| -| `hash_fields` | array | Fields to use for generating the sticky session identifier. Supports dot notation for nested fields (e.g., `metadata.user_id`, `metadata.session_id`) | -| `ttl` | number | Time-to-live in seconds for the sticky session. After this period, a new target may be selected. Default: 3600 (1 hour) | +| Pattern | Use Case | +|---------|----------| +| **Between Providers** | Route to different providers; model comes from request | +| **Multiple API Keys** | Distribute load across rate limits from different accounts | +| **Cost Optimization** | Send most traffic to cheaper models, reserve premium for a portion | +| **Gradual Migration** | Test new models with small percentage before full rollout | -### How It Works + +The `@provider-slug/model-name` format automatically routes to the correct provider. Set up providers in [Model Catalog](https://app.portkey.ai/model-catalog). + -1. **Identifier Generation**: When a request arrives, Portkey generates a hash from the specified `hash_fields` values -2. **Target Lookup**: The hash is used to look up the previously assigned target from cache -3. **Consistent Routing**: If a cached assignment exists and hasn't expired, the request goes to the same target -4. **New Assignment**: If no cached assignment exists, a new target is selected based on weights and cached for future requests +[Create](/product/ai-gateway/configs#creating-configs) and [use](/product/ai-gateway/configs#using-configs) configs in your requests. - -Sticky sessions use a two-tier cache system (in-memory + Redis) for fast lookups and persistence across gateway instances in distributed deployments. - +## How It Works -## Caveats and Considerations +1. **Define targets & weights** — Assign a `weight` to each target. Weights represent relative share of traffic. +2. **Weight normalization** — Portkey normalizes weights to sum to 100%. Example: weights 5, 3, 1 become 55%, 33%, 11%. +3. **Request distribution** — Each request routes to a target based on normalized probabilities. + + +- Default `weight`: `1` +- Minimum `weight`: `0` (stops traffic without removing from config) +- Unset weights default to `1` + -While the Load Balancing feature offers numerous benefits, there are a few things to consider: +## Considerations -1. Ensure the LLMs in your list are compatible with your use case. Not all LLMs offer the same capabilities or respond in the same format. -2. Be aware of your usage with each LLM. Depending on your weight distribution, your usage with each LLM could vary significantly. -3. Keep in mind that each LLM has its own latency and pricing. Diversifying your traffic could have implications on the cost and response time. -4. **Sticky sessions** require Redis for persistence across gateway instances. Without Redis, sticky sessions will only work within a single gateway instance's memory. +- Ensure LLMs in your list are compatible with your use case +- Monitor usage per LLM—weight distribution affects spend +- Each LLM has different latency and pricing diff --git a/product/ai-gateway/virtual-keys.mdx b/product/ai-gateway/virtual-keys.mdx index f92dee0a..5ee73383 100644 --- a/product/ai-gateway/virtual-keys.mdx +++ b/product/ai-gateway/virtual-keys.mdx @@ -1,68 +1,224 @@ --- title: "Virtual Keys" -description: "Portkey's virtual key system allows you to securely store your LLM API keys in our vault, utilizing a unique virtual identifier to streamline API key management." -tag: "Deprecated" +description: "How Portkey's virtual key system works: use one Portkey API key to access multiple AI providers and models" --- - - MIGRATION NOTICE +## What Are Virtual Keys? -We are upgrading the Virtual Key experience with the [Model Catalog](/support/upgrade-to-model-catalog) feature. +Portkey's virtual key system lets you use **one Portkey API key** to connect to Portkey's gateway, which internally uses your provider credentials to connect to multiple AI providers and models. -With Model Catalog, you can now: -- Set model level budget & rate limits -- Inherit budget & rate limits from parent AI provider integrations -- Set granular, workspace-level access controls -- Pass the provider slug (previosuly known as `virtual key`) with the model param in your LLM requests - +This is the core concept that makes AI gateways powerful: +- **One API key** (Portkey's) → connects to **multiple providers** (OpenAI, Anthropic, etc.) → accesses **hundreds of models** +- No need to manage separate API keys for each provider in your code +- Centralized credential management and governance - -Learn how to replace your virtual keys with Model Catalog - + +The Virtual Keys feature has evolved into the **Model Catalog** system, which provides better governance, centralized management, and model-level controls. The core concept remains the same - one key, many providers. ---- +[Learn more about Model Catalog →](/product/model-catalog) + + +## How It Works: Portkey API Key → Gateway → Provider Credentials + +Portkey uses a two-layer authentication system: + +### 1. Portkey API Key (Your Virtual Key) +Your **Portkey API Key** is your virtual key - it's the single key you use to authenticate with Portkey's gateway. This key gives you access to all providers and models configured in your account. + +**Where to get it:** +- Go to [Settings → API Keys](https://app.portkey.ai/api-keys) +- Create a new API key with appropriate permissions +- Use it in the `x-portkey-api-key` header or `PORTKEY_API_KEY` environment variable + +**What it does:** +- Authenticates requests to Portkey's gateway +- Controls access to Portkey features (completions, prompts, configs, guardrails) +- Manages permissions (read/write/delete for different features) +- Tracks usage and analytics + +**Example:** +```python +from portkey_ai import Portkey + +# One Portkey API key gives you access to all providers +portkey = Portkey(api_key="PORTKEY_API_KEY") + +# Use any provider/model configured in your account +response = portkey.chat.completions.create( + model="@openai-prod/gpt-4o", # Gateway uses stored OpenAI credentials + messages=[{"role": "user", "content": "Hello!"}] +) +``` + +### 2. Provider Credentials (Stored Securely in Model Catalog) +Your **provider credentials** (OpenAI API key, Anthropic API key, etc.) are stored securely in Portkey's Model Catalog. The gateway uses these credentials internally - you never expose them in your code. + +**How it works:** +1. Store provider credentials once in Model Catalog (creates an Integration) +2. Share with workspaces (becomes an AI Provider) +3. Use provider slug in code: `@provider-slug/model-name` +4. Gateway automatically uses the stored credentials when you make requests + +**Security:** +- Credentials encrypted in secure vaults +- Decrypted only in isolated workers during requests +- Never exposed in logs, responses, or UI +- Cannot be reverse-engineered from provider slugs + +**The Flow:** +``` +Your Code → Portkey API Key → Portkey Gateway → Provider Credentials → AI Provider → Model +``` -This feature also provides the following benefits: + + Learn how to set up providers and manage credentials in Model Catalog + -* Easier key rotation -* The ability to generate multiple virtual keys for a single API key -* Imposition of restrictions [based on cost](/product/ai-gateway/virtual-keys/budget-limits), request volume, and user access +## How Model Catalog Works -These can be managed within your account under the "Virtual Keys" tab. +Model Catalog organizes AI access in a three-level hierarchy: -## Creating Virtual Keys: +### Credentials → Providers → Models -1. Navigate to the "Virtual Keys" page and click the "Add Key" button in the top right corner. -2. Select your AI provider, name your key uniquely, and note any usage specifics if needed. +**1. Integrations (Organization Level)** +- Where credentials are stored +- Created by org admins +- Can be shared with multiple workspaces +- Set default budgets, rate limits, and model allow-lists + +**2. AI Providers (Workspace Level)** +- What workspaces see and use +- Inherit from org-level Integrations or workspace-only +- Workspace-specific budgets and rate limits +- Represented by slugs like `@openai-prod` + +**3. Models** +- Individual AI models you can call +- Format: `@provider-slug/model-name` (e.g., `@openai-prod/gpt-4o`) +- Access controlled by model provisioning - **Tip:** You can register multiple keys for one provider or use different names for the same key for easy identification. +**Quick Start:** Add a provider in Model Catalog → AI Providers → Add Provider. Choose existing credentials or create new ones for just your workspace. -### Azure Virtual Keys -Azure Virtual Keys allow you to manage multiple Azure deployments under a single virtual key. This feature simplifies API key management and enables flexible usage of different Azure OpenAI models. -You can create multiple deployments under the same resource group and manage them using a single virtual key. + + For org admins: Learn how to centrally manage credentials and share them across workspaces + - - - +## How Credential Storage Works + +Portkey stores your provider credentials with enterprise-grade security: + +### Encryption & Storage +- **Encrypted at rest** in secure vaults +- **Decrypted in-memory** only during request processing +- **Isolated workers** handle decryption (never in your application) +- **No exposure** - credentials never appear in logs, responses, or UI + +### Key Rotation +- Update credentials without changing code +- Rotate keys in Model Catalog → Integrations +- All workspaces using that Integration automatically get the new key +- No downtime or code changes required + +### Multiple Credentials +- Store multiple credentials for the same provider +- Create different providers with different limits (dev, staging, prod) +- Use same underlying credentials with different governance rules + +## Using Providers in Your Code + +There are three ways to specify providers. We recommend the model prefix format for clarity and simplicity. + +### Method 1: Model Prefix (Recommended) + +Specify provider and model together in the `model` parameter. This keeps everything in one place and makes switching between providers/models simple. + +```python +from portkey_ai import Portkey + +portkey = Portkey(api_key="PORTKEY_API_KEY") + +# Recommended: Provider + model together +response = portkey.chat.completions.create( + model="@openai-prod/gpt-4o", # Provider slug + model name + messages=[{"role": "user", "content": "Hello!"}] +) +``` + +```javascript +import { Portkey } from 'portkey-ai'; + +const portkey = new Portkey({ apiKey: "PORTKEY_API_KEY" }); -To use the required deployment, simply pass the `alias` of the deployment as the `model` in LLM request body. In case the models is left empty or the specified alias does not exist, the default deployment is used. +// Recommended: Provider + model together +const response = await portkey.chat.completions.create({ + model: "@openai-prod/gpt-4o", // Provider slug + model name + messages: [{ role: "user", content: "Hello!" }] +}); +``` +### Method 2: Provider Header -## How are the provider API keys stored? +Specify provider separately using the `provider` parameter. Remember to include the `@` symbol. -Your API keys are encrypted and stored in secure vaults, accessible only at the moment of a request. Decryption is performed exclusively in isolated workers and only when necessary, ensuring the highest level of data security. +```python +from portkey_ai import Portkey -## How are the provider keys linked to the virtual key? +portkey = Portkey( + api_key="PORTKEY_API_KEY", + provider="@openai-prod" # Provider with @ symbol +) -We randomly generate virtual keys and link them separately to the securely stored keys. This means, your raw API keys can not be reverse engineered from the virtual keys. +# Then just specify model name +response = portkey.chat.completions.create( + model="gpt-4o", # Just the model name + messages=[{"role": "user", "content": "Hello!"}] +) +``` -## Using Virtual Keys +```javascript +import { Portkey } from 'portkey-ai'; + +const portkey = new Portkey({ + apiKey: "PORTKEY_API_KEY", + provider: "@openai-prod" // Provider with @ symbol +}); + +// Then just specify model name +const response = await portkey.chat.completions.create({ + model: "gpt-4o", // Just the model name + messages: [{ role: "user", content: "Hello!" }] +}); +``` + +### Method 3: Legacy Virtual Key (Backwards Compatible) + +The `virtual_key` parameter still works for backwards compatibility, but it's not recommended for new code. + +```python +from portkey_ai import Portkey + +# Still works, but not recommended for new code +portkey = Portkey( + api_key="PORTKEY_API_KEY", + virtual_key="openai-prod" # Legacy format (no @) +) + +response = portkey.chat.completions.create( + model="gpt-4o", + messages=[{"role": "user", "content": "Hello!"}] +) +``` + + +**Recommendation:** Use Method 1 (model prefix) - it's explicit, keeps everything in one place, and makes switching between providers/models simple. + ### Using the Portkey SDK -Add the virtual key directly to the initialization configuration for Portkey. +Add the provider directly to the initialization configuration for Portkey. + @@ -78,7 +234,7 @@ const portkey = new Portkey({ ```py -# Construct a client with a virtual key +# Construct a client with a provider portkey = Portkey( api_key="PORTKEY_API_KEY", provider="@PROVIDER" @@ -87,7 +243,7 @@ portkey = Portkey( -Alternatively, you can override the virtual key during the completions call as follows: +Alternatively, you can override the provider during the completions call as follows: @@ -112,7 +268,8 @@ completion = portkey.with_options(provider="@...").chat.completions.create( ### Using the OpenAI SDK -Add the virtual key directly to the initialization configuration for the OpenAI client. +Add the provider directly to the initialization configuration for the OpenAI client. + @@ -134,7 +291,7 @@ const openai = new OpenAI({ ```py -# Construct a client with a virtual key +# Construct a client with a provider from openai import OpenAI from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders @@ -150,29 +307,6 @@ client = OpenAI( -Alternatively, you can override the virtual key during the completions call as follows: - - - - -```js -const chatCompletion = await portkey.chat.completions.create({ - messages: [{ role: 'user', content: 'Say this is a test' }], - model: 'gpt-3.5-turbo', -}, {provider:"@OVERRIDING_PROVIDER"}); -``` - - - -```py -completion = portkey.with_options(provider="@...").chat.completions.create( - messages = [{ "role": 'user', "content": 'Say this is a test' }], - model = 'gpt-3.5-turbo' -) -``` - - - ### Using alias with Azure virtual keys: ```js @@ -182,12 +316,11 @@ const chatCompletion = await portkey.chat.completions.create({ }, {provider:"@PROVIDER"}); ``` - ### Self-Hosted LLM Virtual Keys -Portkey supports creating virtual keys for your privately hosted LLMs, allowing you to manage them alongside commercial providers. +Portkey supports creating providers for your privately hosted LLMs, allowing you to manage them alongside commercial providers. -1. When creating a new virtual key, enable the "Local/Privately hosted provider" toggle +1. When adding a provider in Model Catalog, enable the "Local/Privately hosted provider" toggle 2. Select the provider API specification your LLM implements (typically OpenAI) 3. Enter your model's base URL in the "Custom Host" field 4. Add any required authentication headers and their values @@ -200,21 +333,341 @@ This allows you to use your self-hosted models with all Portkey features includi For more details, see [Bring Your Own LLM](/integrations/llms/byollm). -## Setting Budget Limits +## Using Providers in Configs + +Configs also support three methods for specifying providers: + +### Method 1: Model in override_params (Recommended) + +Specify provider and model together in `override_params`. This works great with multi-provider strategies. + +```json +{ + "strategy": { + "mode": "fallback" + }, + "targets": [ + { + "override_params": { + "model": "@openai-prod/gpt-4o" // Provider + model together + } + }, + { + "override_params": { + "model": "@anthropic/claude-3-sonnet" // Easy to switch providers + } + } + ] +} +``` + +### Method 2: Provider in Target + +Specify provider directly in the target. Remember the `@` symbol. + +```json +{ + "strategy": { + "mode": "single" + }, + "targets": [ + { + "provider": "@openai-prod", // Provider with @ symbol + "override_params": { + "model": "gpt-4o" // Just the model name + } + } + ] +} +``` + +### Method 3: Legacy virtual_key (Backwards Compatible) + +The `virtual_key` field still works in configs. + +```json +{ + "strategy": { + "mode": "single" + }, + "targets": [ + { + "virtual_key": "openai-prod" // Legacy format (no @) + } + ] +} +``` + + +**Recommendation:** Use Method 1 (model in override_params) - it's explicit and works great with multi-provider strategies like fallback and load balancing. + + +## Budget Limits + +Set spending controls to prevent unexpected costs. Budget limits can be applied at the Integration level and cascade to all Providers created from it. + +### Types of Budget Limits + +**Cost-Based Limits** +- Set maximum spend in USD (minimum $1) +- Automatically disables provider when limit reached +- Track spending in real-time + +**Token-Based Limits** +- Set maximum tokens consumed (minimum 100 tokens) +- Control usage independent of cost fluctuations +- Track both input and output tokens + +### Setting Budget Limits + +**At Integration Level:** +1. Go to Integrations → Select Integration +2. Navigate to Workspace Provisioning +3. Click Edit Budget & Rate Limits for each workspace +4. Set cost-based or token-based limits + +**Per Workspace:** +- Different budgets for different workspaces +- Finance team: $500/month +- Engineering team: $2000/month +- Marketing team: $300/month + +### Alert Thresholds + +Set notifications before reaching limits: +- Cost-based: Alert at 80% of budget (e.g., $400 of $500) +- Token-based: Alert at 90% of token limit +- Email notifications sent automatically +- Continue using until full limit reached + +### Periodic Resets + +Configure automatic budget resets: +- **No Reset**: Budget applies until exhausted +- **Weekly Reset**: Resets every Sunday at 12 AM UTC +- **Monthly Reset**: Resets on 1st of month at 12 AM UTC + + +Budget limits cannot be edited once set. To change a limit, duplicate the provider and create a new one with the desired limit. + + + + Detailed guide to setting and managing budget limits + + +## Rate Limits + +Control request velocity to manage load, prevent abuse, and ensure fair resource distribution. + +### Types of Rate Limits + +**Request-Based Limits** +- Maximum requests per time period +- Example: 1000 requests/minute +- Prevents API abuse and DoS attacks + +**Token-Based Limits** +- Maximum tokens consumed per time period +- Example: 1M tokens/hour +- Controls usage independent of request count + +### Time Windows + +Choose from three intervals: + +- **Per Minute**: Fine-grained control, resets every minute +- **Per Hour**: Balanced control, resets hourly +- **Per Day**: Broad control, resets daily + +### Setting Rate Limits + +**At Integration Level:** +1. Go to Integrations → Select Integration +2. Navigate to Workspace Provisioning +3. Click Edit Budget & Rate Limits +4. Set request-based or token-based rate limits +5. Choose time window (minute/hour/day) + +**Example Configuration:** +- Engineering workspace: 5000 requests/hour +- Finance workspace: 1000 requests/day +- Marketing workspace: 200 requests/minute + +### Exceeding Rate Limits + +When a rate limit is reached: +- Subsequent requests rejected with error code +- Clear error message indicating limit exceeded +- Limit automatically resets after time period +- No manual intervention needed -Portkey provides a simple way to set budget limits for any of your virtual keys and helps you manage your spending on AI providers (and LLMs) - giving you confidence and control over your application's costs. + + Detailed guide to setting and managing rate limits + -[Budget Limits](/product/ai-gateway/virtual-keys/budget-limits) +## Model Access Control + +Control which models users and workspaces can access through multiple layers of governance. + +### Model Provisioning (Integration Level) + +When creating an Integration, specify which models are available: + +**Allow All Models** +- Provides access to all models from that provider +- Useful for development or when you trust the team +- Less control over costs + +**Allow Specific Models (Recommended)** +- Create an explicit allow-list of approved models +- Only selected models appear in workspace Model Catalog +- Better cost control and compliance + +**Example:** +- Production Integration: Only `gpt-4o`, `gpt-4o-mini` +- Development Integration: All GPT models +- Research Integration: Experimental models only + +### Workspace-Level Access + +Control access at the workspace level: + +1. **Provision Integrations to Workspaces** + - Choose which workspaces can use which Integrations + - Each workspace sees only provisioned providers + - Instant access revocation + +2. **Workspace-Specific Model Lists** + - Override Integration model list per workspace + - Finance workspace: Only cost-effective models + - Engineering workspace: All models for experimentation + +### User-Level Access (API Keys) + +Control access through API key permissions: + +**Model Catalog Permissions:** +- **Disabled**: Cannot access Model Catalog +- **Read**: Can view providers and models +- **Write**: Can create/edit providers +- **Delete**: Can remove providers + +**Example Use Cases:** +- Developer API key: Read access to Model Catalog +- Admin API key: Write/Delete access +- Service account: Read access to specific providers only + +### Model Whitelist Guardrail + +Use the **Model Whitelist** guardrail to enforce model restrictions at the request level: + +**How it works:** +- Check if the model in the request is in the allowed list +- Block requests to unapproved models +- Works as an input guardrail (before request is sent) + +**Configuration:** +```json +{ + "before_request_hooks": [{ + "type": "guardrail", + "id": "model-whitelist", + "checks": [{ + "id": "default.modelWhitelist", + "parameters": { + "models": ["@openai-prod/gpt-4o", "@openai-prod/gpt-4o-mini"], + "inverse": false + } + }], + "deny": true + }] +} +``` + +**Use Cases:** +- Enforce model restrictions per API key +- Prevent accidental use of expensive models +- Compliance requirements + + + Learn about all guardrail options including Model Whitelist + + +## Model Rules Guardrail + + +**Coming Soon:** Model Rules guardrail provides advanced model access control. More details will be added here once the feature documentation is available. + +If you have information about Model Rules, please share it and we'll update this section. + + +Model Rules guardrail enables fine-grained control over model access based on: +- User roles and permissions +- Request metadata +- Dynamic model allow-lists +- Context-aware access control + +## Azure Virtual Keys + +Azure Virtual Keys allow you to manage multiple Azure deployments under a single provider. This feature simplifies API key management and enables flexible usage of different Azure OpenAI models. + +You can create multiple deployments under the same resource group and manage them using a single provider. + + + + + +To use the required deployment, simply pass the `alias` of the deployment as the `model` in LLM request body. In case the models is left empty or the specified alias does not exist, the default deployment is used. ## Prompt Templates -Choose your Virtual Key within Portkey’s prompt templates, and it will be automatically retrieved and ready for use. +Choose your Provider within Portkey's prompt templates, and it will be automatically retrieved and ready for use. ## Langchain / LlamaIndex -Set the virtual key when utilizing Portkey's custom LLM as shown below: +Set the provider when utilizing Portkey's custom LLM as shown below: ```py # Example in Langchain -llm = PortkeyLLM(api_key="PORTKEY_API_KEY",provider="@PROVIDER") +llm = PortkeyLLM(api_key="PORTKEY_API_KEY", provider="@PROVIDER") ``` + +## Quick Reference: Virtual Keys Concept + +**The Core Concept (Still True):** +- ✅ **One Portkey API key** → Access **multiple providers** → Use **hundreds of models** +- ✅ Provider credentials stored securely, never exposed in code +- ✅ Centralized management and governance + +**What Evolved:** +| Old Virtual Keys Feature | New Model Catalog System | +|-------------------------|--------------------------| +| Create Virtual Key in workspace | Add Provider in Model Catalog | +| Always enter API keys | Choose existing credentials or create new | +| `virtual_key` header | `model="@provider/model"` format (recommended) | +| Budget per virtual key | Budget per workspace (from Integration) | +| All models accessible | Model allow-list per Integration | +| Workspace-only | Org-level sharing + workspace-level | + +## Related Guides + + + + Complete guide to Model Catalog features + + + Managing credentials across workspaces + + + Detailed budget limits guide + + + Detailed rate limits guide + + + Model access control with guardrails + + + Step-by-step migration from Virtual Keys + + diff --git a/product/enterprise-offering/budget-limits.mdx b/product/enterprise-offering/budget-limits.mdx index be58e8e4..26ddeca9 100644 --- a/product/enterprise-offering/budget-limits.mdx +++ b/product/enterprise-offering/budget-limits.mdx @@ -1,4 +1,4 @@ --- title: Budget Limits -url: /product/ai-gateway/virtual-keys/budget-limits +url: /product/model-catalog/integrations#3-budget-%26-rate-limits --- diff --git a/product/enterprise-offering/org-management/api-keys-authn-and-authz.mdx b/product/enterprise-offering/org-management/api-keys-authn-and-authz.mdx index 1e8befba..921f845a 100644 --- a/product/enterprise-offering/org-management/api-keys-authn-and-authz.mdx +++ b/product/enterprise-offering/org-management/api-keys-authn-and-authz.mdx @@ -80,13 +80,13 @@ Admin API Keys should be carefully managed and their use should be limited to ne | `configs.delete` | Delete configurations | | `configs.read` | View configuration details | | `configs.list` | List available configurations | - | `virtual_keys.create` | Create new virtual keys | - | `virtual_keys.update` | Update existing virtual keys | - | `virtual_keys.delete` | Delete virtual keys | - | `virtual_keys.duplicate` | Duplicate existing virtual keys | - | `virtual_keys.read` | View virtual key details | - | `virtual_keys.list` | List available virtual keys | - | `virtual_keys.copy` | Copy virtual keys between workspaces | + | `virtual_keys.create` | Create new providers | + | `virtual_keys.update` | Update existing providers | + | `virtual_keys.delete` | Delete providers | + | `virtual_keys.duplicate` | Duplicate existing providers | + | `virtual_keys.read` | View provider details | + | `virtual_keys.list` | List available providers | + | `virtual_keys.copy` | Copy providers between workspaces | @@ -163,13 +163,13 @@ Workspace API Keys provide a more granular level of access control, allowing you | `configs.delete` | Delete configurations | | `configs.read` | View configuration details | | `configs.list` | List available configurations | - | `virtual_keys.create` | Create new virtual keys | - | `virtual_keys.update` | Update existing virtual keys | - | `virtual_keys.delete` | Delete virtual keys | - | `virtual_keys.duplicate` | Duplicate existing virtual keys | - | `virtual_keys.read` | View virtual key details | - | `virtual_keys.list` | List available virtual keys | - | `virtual_keys.copy` | Copy virtual keys between workspaces | + | `virtual_keys.create` | Create new providers | + | `virtual_keys.update` | Update existing providers | + | `virtual_keys.delete` | Delete providers | + | `virtual_keys.duplicate` | Duplicate existing providers | + | `virtual_keys.read` | View provider details | + | `virtual_keys.list` | List available providers | + | `virtual_keys.copy` | Copy providers between workspaces | diff --git a/product/model-catalog.mdx b/product/model-catalog.mdx index c102fd65..155f521d 100644 --- a/product/model-catalog.mdx +++ b/product/model-catalog.mdx @@ -25,49 +25,65 @@ The Model Catalog is a centralized hub for viewing and managing all AI providers - AI Providers represent connections to AI services. Each AI Provider has: + AI Providers are what you use in your code. Each provider has: - ✅ A unique slug (e.g., `@openai-prod`) - ✅ Securely stored credentials - ✅ Budget and rate limits - ✅ Access to specific models + + **To use:** Add a provider, then use `@provider-slug/model-name` in your code. - The Models section is a gallery of all AI models available. Each Model entry includes: + The Models section is a gallery of all AI models available in your workspace. Each Model entry includes: - ✅ Model slug (`@openai-prod/gpt-4o`) - ✅ Ready-to-use code snippets - ✅ Input/output token limits - ✅ Pricing information (where available) + + [View all available models →](https://app.portkey.ai/model-catalog/models) ## Adding an AI Provider -You can add providers via **UI** (follow the steps below) or [**API**](/api-reference/admin-api/introduction). +Add providers via **UI** (follow the steps below) or [**API**](/api-reference/admin-api/introduction). + Navigate to the [Model Catalog](https://app.portkey.ai/model-catalog) in your Portkey dashboard. + Portkey Model Catalog - Add Provider - + Choose from list (OpenAI, Anthropic, etc.) or _Self-hosted / Custom_. Portkey Model Catalog - Add Provider - Choose Service - - Choose existing credentials or create new ones. + + **If credentials already exist:** + - Select from the dropdown (if your org admin set them up) + - Skip to step 4 - no API keys needed! + + **If creating new credentials:** + - Choose "Create new credentials" + - Enter your API keys here + + + Creating new credentials here automatically creates a workspace-linked integration. To share credentials across multiple workspaces, create them in the [Integrations](/product/model-catalog/integrations) page (org admin only). + Model Catalog - Add credentials - - Choose the name and slug for this provider. The slug cannot be changed later and will be used to reference the AI models. + + Choose a name and slug for this provider. The slug (e.g., `openai-prod`) will be used in your code like `@openai-prod/gpt-4o`. Model Catalog - Add Provider Details @@ -77,9 +93,7 @@ You can add providers via **UI** (follow the steps below) or [**API**](/api-refe ## Using Provider Models -Once you have AI Providers set up, you can use their models in your applications through various methods. - -### 1. Model String Composition (Recommended) +Once you have AI Providers set up, use their models in your applications. There are three methods - we recommend the model prefix format for clarity. In Portkey, model strings follow this format: @@ -153,7 +167,7 @@ curl https://api.portkey.ai/v1/chat/completions \ ### 2. Using the `provider` header -You can also specify the provider in the header instead of the model string. Remember to add the `@` before your provider slug. +Specify the provider separately using the `provider` parameter. Remember to add the `@` before your provider slug. @@ -235,36 +249,50 @@ curl https://api.portkey.ai/v1/chat/completions \ ### 3. Specify `provider` in the config -Portkey's configs are simple JSON structures that help you define routing logic for LLM requests. You can learn more about them [here](/product/ai-gateway/configs). +Portkey's configs are simple JSON structures that help you define routing logic for LLM requests. Learn more [here](/product/ai-gateway/configs). -Portkey's config allows you to declare either the `provider` OR `provider+model` configuration in your routing config. Here's how: +There are three ways to specify providers in configs: + +**Method 1: Model in override_params (Recommended)** + +Specify provider and model together in `override_params`. Works great with multi-provider strategies: -**1. Defining the Provider** ```json -// Specify provider in the config { - "provider": "@openai-prod" + "strategy": { "mode": "fallback" }, + "targets": [{ + "override_params": { "model": "@openai-prod/gpt-4o" } + }, { + "override_params": { "model": "@anthropic/claude-3-sonnet" } + }] } ``` -**2. Defining the Provider + Model** +**Method 2: Provider in target** + +Specify provider directly in the target (remember the `@` symbol): + ```json -// Specify the model string in "override_params" { - "override_params": { - "model": "@openai-prod/gpt-4o" - } + "strategy": { "mode": "single" }, + "targets": [{ + "provider": "@openai-prod", + "override_params": { + "model": "gpt-4o" + } + }] } ``` -Using `overide_params` in strategy +**Method 3: Legacy virtual_key (Backwards Compatible)** + +The `virtual_key` field still works: + ```json { - "strategy": { "mode": "fallback" }, + "strategy": { "mode": "single" }, "targets": [{ - "override_params": { "model": "@openai-prod/gpt-4o" } - }, { - "override_params": { "model": "@anthropic/claude-3-sonnet" } + "virtual_key": "openai-prod" }] } ``` @@ -274,23 +302,33 @@ Using `overide_params` in strategy -## Integrations +## How It Works: Credentials → Providers → Models -At the heart of Model Catalog is a simple concept: your AI provider credentials need to be stored securely, governed carefully and managed centrally. In Portkey, these stored credentials are called **Integrations**. Think of an Integration as a secure vault for your API keys - whether it's your OpenAI API key, AWS Bedrock credentials, or Azure OpenAI configuration. + + Learn how Portkey's virtual key system works: use one Portkey API key to access multiple providers and models + - - Integrations Overview Page - +Think of it like a password manager: + +1. **Store your credentials once** (at the org level) - This is called an "Integration" + - Like saving your OpenAI API key in a password vault + - You can share it with multiple workspaces without re-entering it + +2. **Use it in your workspace** - This becomes a "Provider" + - Like having a saved login that appears in your workspace + - Each workspace can have different settings (budgets, rate limits) for the same credentials -Once you create an Integration (by storing your credentials), you can use it to create multiple AI Providers. For example, you might have one OpenAI Integration, but create three different AI Providers from it: -- `@openai-dev` for development with strict rate limits -- `@openai-staging` for testing with moderate budgets -- `@openai-prod` for production with higher limits +3. **Call specific models** - Use the model slug in your code + - Format: `@provider-slug/model-name` (e.g., `@openai-prod/gpt-4o`) -This separation gives you granular control over how different teams and environments use the same underlying credentials. + +**Quick Start:** When adding a provider in Model Catalog, choose either: +- **Use existing credentials** from your organization (if your admin set them up) +- **Create new credentials** for just this workspace (creates a workspace-linked integration automatically) + - - Learn how to create and manage AI service credentials across your organization + + For org admins: Learn how to centrally manage credentials and share them across workspaces ## Managing Access and Controls diff --git a/product/model-catalog/custom-models.mdx b/product/model-catalog/custom-models.mdx index cb867707..38a8b7f2 100644 --- a/product/model-catalog/custom-models.mdx +++ b/product/model-catalog/custom-models.mdx @@ -57,7 +57,7 @@ Here’s a breakdown of each field in the form: Once you've added your custom model, you can use it just like any other model in the catalog. Simply reference its **Model Slug** in your API calls. -For example, to use a custom model with the slug `my-custom-model-v1` in a chat completion request, you would set it as the `model` in your virtual key configuration or pass it directly in the request header: +For example, to use a custom model with the slug `my-custom-model-v1` in a chat completion request, you would set it as the `model` in your provider configuration or pass it directly in the request header: diff --git a/product/model-catalog/integrations.mdx b/product/model-catalog/integrations.mdx index 336c4629..9281e527 100644 --- a/product/model-catalog/integrations.mdx +++ b/product/model-catalog/integrations.mdx @@ -4,26 +4,49 @@ description: "Securely store and manage AI provider credentials across your orga --- -Integrations are designed for **organization admins and managers** who need to manage AI provider access across teams. If you're looking to use AI models, see the [Model Catalog](/product/model-catalog) documentation. +**For Organization Admins:** This page is for managing credentials across multiple workspaces. To use AI models in your workspace, see the [Model Catalog](/product/model-catalog) documentation. -Integrations are the secure foundation for AI provider management in Portkey. Think of them as your organization's credential vault - a centralized place where you store API keys, configure access controls, and set usage policies that cascade throughout your entire AI infrastructure. +**What are Integrations?** +A place to store credentials once and share them with multiple workspaces. -When you create an Integration, you're not just storing credentials - you're establishing a governance layer that controls: -- **Who** can access these AI services (through workspace provisioning) -- **What** models they can use (through model provisioning) -- **How much** they can spend (through budget limits) -- **How fast** they can consume resources (through rate limits) +**Why use them?** +- **Save time:** Store API keys once, use in many workspaces +- **Control access:** Decide which workspaces can use which credentials +- **Control models:** Enable or disable specific models at the Integration level +- **Set limits:** Different budgets and rate limits per workspace -## Why Integrations Matter +**How it works:** +1. Store credentials here (creates an Integration) +2. Choose which models are available (enable/disable models) +3. Share with workspaces (they see it as a Provider in their Model Catalog) +4. Each workspace can use the same credentials with different limits, but only access models you've enabled -In enterprise AI deployments, raw API keys scattered across teams create security risks and make cost control impossible. Integrations solve this by: +**Simple analogy:** Like a shared password vault. Store your OpenAI API key once, then share it with multiple workspaces. Each workspace can use the same credentials but with different budgets and rate limits. -1. **Centralizing Credentials**: Store API keys once, use everywhere through secure references -2. **Enabling Governance**: Apply organization-wide policies that automatically enforce compliance -3. **Simplifying Management**: Update credentials, limits, or access in one place -4. **Maintaining Security**: Never expose raw API keys to end users or applications -5. **Granular Observabilty**: Get complete end-to-end observability and track 40+ crucial metric for every single LLM call +When you create an Integration, you control: +- **Who** can use these credentials (which workspaces) +- **What** models they can access +- **How much** they can spend (budget limits) +- **How fast** they can make requests (rate limits) + +## Why Use Integrations? + +Instead of each workspace entering the same API keys separately, store them once and share them: + +1. **Save time** - No need to re-enter credentials in each workspace +2. **Control access** - Decide which workspaces can use which credentials +3. **Set limits** - Different budgets and rate limits per workspace +4. **Stay secure** - API keys are encrypted and never exposed to end users +5. **Track everything** - See usage and costs across all workspaces + +## Understanding the Integrations Dashboard + +Navigate to the [Integrations page](https://app.portkey.ai/integrations) in your Portkey organization settings. The page is organized into three tabs, each serving a distinct purpose: + +* **`All`**: This is a comprehensive list of all 50+ providers Portkey supports. This is your starting point for connecting a new provider to your organization. +* **`Connected`**: This tab lists all the integrations that you have personally connected at the organization level. It's your primary view for managing your centrally-governed providers. +* **`Workspace-Created`**: This tab gives you complete visibility and governance over any integrations created *by Workspace Admins* for their specific workspaces. It ensures that even with delegated control, you maintain a full audit trail and can manage these instances if needed. ## Creating an Integration @@ -31,7 +54,7 @@ Let's walk through creating an Integration for AWS Bedrock as an example: -From your admin panel, go to **Integrations** and click **Create New Integration**. +From your admin panel, go to [**Integrations**](https://app.portkey.ai/model-catalog/providers) and click **Create New Integration** (or click **Connect** from the **`All`** tab). @@ -91,11 +114,11 @@ Workspace provisioning determines which teams and projects can access this Integ #### How It Works -When you provision an Integration to a workspace: -1. That workspace can create AI Providers using this Integration's credentials -2. All usage is tracked at the workspace level for accountability -3. Budget and rate limits can be applied per workspace -4. Access can be revoked instantly if needed +When you share credentials with a workspace: +1. That workspace sees it as an **AI Provider** in their Model Catalog +2. They can use it immediately - no need to enter credentials again +3. Set different budgets/rate limits for each workspace +4. Revoke access anytime #### Setting Up Workspace Provisioning @@ -122,10 +145,12 @@ When you provision an Integration to a workspace: ## 2. Model Provisioning -Model provisioning gives you fine-grained control over which AI models are accessible through an Integration. This is essential for: -- Controlling costs by restricting access to expensive models -- Ensuring compliance by limiting models to approved ones -- Maintaining consistency by standardizing model usage across teams +**Model lists are tied to Integrations.** When you create an Integration, you control which models are available. All Providers created from that Integration will only have access to the models you've enabled. + +This is essential for: +- **Controlling costs:** Restrict access to expensive models +- **Ensuring compliance:** Limit models to approved ones +- **Maintaining consistency:** Standardize model usage across teams #### Setting Up Model Provisioning @@ -138,6 +163,8 @@ Model provisioning gives you fine-grained control over which AI models are acces - **Allow All Models**: Provides access to all models offered by the provider - **Allow Specific Models**: Create an allowlist of approved models +**Important:** The model list you set here applies to all Providers created from this Integration. Workspaces will only see and be able to use the models you've enabled. + #### Advanced Model Management @@ -209,7 +236,7 @@ Set a maximum number of tokens that can be consumed, allowing you to control usa #### Alert Thresholds -You can now set alert thresholds to receive notifications before your budget limit is reached: +Set alert thresholds to receive notifications before your budget limit is reached: * For cost-based budgets, set thresholds in USD * For token-based budgets, set thresholds in tokens @@ -218,7 +245,7 @@ You can now set alert thresholds to receive notifications before your budget lim #### Periodic Reset Options -You can configure budget limits to automatically reset at regular intervals: +Configure budget limits to automatically reset at regular intervals: @@ -255,7 +282,7 @@ Rate limits control the velocity of API usage, protecting against runaway proces - **Token-based**: Limit token consumption rate (e.g., 1M tokens/hour) **Time Windows:** -You can choose from three different time intervals for your rate limits: +Choose from three different time intervals for your rate limits: * **Per Minute**: Limits reset every minute, ideal for fine-grained control * **Per Hour**: Limits reset hourly, providing balanced usage control @@ -265,10 +292,10 @@ You can choose from three different time intervals for your rate limits: > > * Rate limits can be set as either request-based or token-based > * Time intervals can be configured as per minute, per hour, or per day -> * Setting the limit to 0 disables the virtual key +> * Setting the limit to 0 disables the provider > * Rate limits apply immediately after being set > * Once set, rate limits **cannot be edited** by any organization member -> * Rate limits work for **all providers** available on Portkey and apply to **all organization members** who use the virtual key +> * Rate limits work for **all providers** available on Portkey and apply to **all organization members** who use the provider > * After a rate limit is reached, requests will be rejected until the time period resets #### Use Cases for Rate Limits @@ -292,14 +319,21 @@ When a rate limit is reached: ### Tracking Spending and Usage -You can track your spending, usage, and 40+ crucial metrics for any specific AI integration by navigating to the Analytics tab and filtering by the **desired key** and **timeframe**. +Track spending, usage, and 40+ crucial metrics for any specific AI integration by navigating to the Analytics tab and filtering by the **desired key** and **timeframe**. +--- +## FAQs + + + Your API keys are always encrypted and stored in secure, isolated vaults. They are only decrypted in-memory, within sandboxed workers, at the exact moment a request is made to the provider. This ensures the highest level of security for your credentials. + + diff --git a/product/observability.mdx b/product/observability.mdx index d7884765..bfe7f2c8 100644 --- a/product/observability.mdx +++ b/product/observability.mdx @@ -38,7 +38,7 @@ Portkey's OpenTelemetry-compliant observability suite gives you complete control

Add feedback values and weights to complete the loop.

- +

Set up budget limits for your provider API keys and gain confidence over your application's costs.

diff --git a/product/observability/budget-limits.mdx b/product/observability/budget-limits.mdx index be58e8e4..26ddeca9 100644 --- a/product/observability/budget-limits.mdx +++ b/product/observability/budget-limits.mdx @@ -1,4 +1,4 @@ --- title: Budget Limits -url: /product/ai-gateway/virtual-keys/budget-limits +url: /product/model-catalog/integrations#3-budget-%26-rate-limits --- diff --git a/product/observability/cost-management.mdx b/product/observability/cost-management.mdx index caadf054..04de88e0 100644 --- a/product/observability/cost-management.mdx +++ b/product/observability/cost-management.mdx @@ -101,7 +101,7 @@ Budget limits currently apply to all providers and models for which Portkey has ### Unsupported Models -If a specific request log shows 0 cents in the COST column, it means that Portkey does not currently track pricing for that model, and it will not count towards the virtual key's budget limit. +If a specific request log shows 0 cents in the COST column, it means that Portkey does not currently track pricing for that model, and it will not count towards the provider's budget limit. For models without pricing support: diff --git a/product/observability/opentelemetry.mdx b/product/observability/opentelemetry.mdx index 596f1e76..c678a92e 100644 --- a/product/observability/opentelemetry.mdx +++ b/product/observability/opentelemetry.mdx @@ -11,7 +11,7 @@ Many popular AI development tools and SDKs, like the Vercel AI SDK, LlamaIndex, Portkey's strength lies in its unique combination of an intelligent **LLM Gateway** and a powerful **Observability** backend. -- **Enriched Data from the Gateway:** Your LLM calls routed through the Portkey Gateway are automatically enriched with deep contextual information—virtual keys, caching status, retry attempts, prompt versions, and more. This data flows seamlessly into Portkey Observability. +- **Enriched Data from the Gateway:** Your LLM calls routed through the Portkey Gateway are automatically enriched with deep contextual information—provider configuration, caching status, retry attempts, prompt versions, and more. This data flows seamlessly into Portkey Observability. - **Holistic View with OpenTelemetry:** By adding an OTel endpoint, Portkey now ingests traces and logs from your *entire* application stack, not just the LLM calls. Instrument your frontend, backend services, databases, and any other component with OTel, and send that data to Portkey. @@ -100,15 +100,6 @@ This means traces sent from OpenTelemetry-instrumented applications automaticall This feature is particularly powerful for applications using frameworks like LangChain, LlamaIndex, or other tools with built-in OpenTelemetry instrumentation that follow GenAI semantic conventions. -## W3C Trace Context Headers - -When making requests to the Portkey Gateway, you can use standard W3C trace context headers (`traceparent` and `baggage`) instead of Portkey-specific headers. This enables seamless integration with existing OpenTelemetry-instrumented applications. - -- **`traceparent`**: The trace ID and span ID are automatically extracted and used for Portkey's tracing -- **`baggage`**: Key-value pairs are parsed and merged into the request metadata - -For detailed usage examples and header specifications, see the [W3C Trace Context Support section in the Tracing documentation](/product/observability/traces#w3c-trace-context-support). - ## Why Use OpenTelemetry with Portkey? Portkey's OTel backend is compatible with any OTel-compliant library. Here are a few popular ones for GenAI and general application observability: diff --git a/product/prompt-engineering-studio.mdx b/product/prompt-engineering-studio.mdx index 2721692a..d335a64e 100644 --- a/product/prompt-engineering-studio.mdx +++ b/product/prompt-engineering-studio.mdx @@ -20,7 +20,7 @@ You can easily access Prompt Engineering Studio using [https://prompt.new](https ## Setting Up AI Providers -Before you can create and manage prompts, you'll need to set up your [LLM integrations](/product/integrations). After configuring your keys, the respective AI providers become available for running and managing prompts. +Before you can create and manage prompts, you'll need to set up your [LLM integrations](/product/model-catalog/integrations). After configuring your keys, the respective AI providers become available for running and managing prompts. Portkey supports over 1600+ models across all the major providers including OpenAI, Anthropic, Google, and many others. This allows you to build and test prompts across multiple models and providers from a single interface.