BerriAI
diff --git a/‎docs/my-website/docs/observability/helicone_integration.md‎
Lines changed: 220 additions & 39 deletions b/‎docs/my-website/docs/observability/helicone_integration.md‎
Lines changed: 220 additions & 39 deletions
diff --git a/‎docs/my-website/docs/proxy/config_settings.md‎
Lines changed: 31 additions & 0 deletions b/‎docs/my-website/docs/proxy/config_settings.md‎
Lines changed: 31 additions & 0 deletions
@@ -1,3 +1,6 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
 # Helicone - OSS LLM Observability Platform
 
 :::tip
@@ -9,9 +12,68 @@ https://github.com/BerriAI/litellm
 
 [Helicone](https://helicone.ai/) is an open source observability platform that proxies your LLM requests and provides key insights into your usage, spend, latency and more.
 
-## Using Helicone with LiteLLM
+## Quick Start
+
+<Tabs>
+<TabItem value="sdk" label="Python SDK">
+
+Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
+
+```python
+import os
+from litellm import completion
+
+## Set env variables
+os.environ["HELICONE_API_KEY"] = "your-helicone-key"
+os.environ["OPENAI_API_KEY"] = "your-openai-key"
+
+# Set callbacks
+litellm.success_callback = ["helicone"]
+
+# OpenAI call
+response = completion(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hi 👋 - I'm OpenAI"}],
+)
 
-LiteLLM provides `success_callbacks` and `failure_callbacks`, allowing you to easily log data to Helicone based on the status of your responses.
+print(response)
+```
+
+</TabItem>
+<TabItem value="proxy" label="LiteLLM Proxy">
+
+Add Helicone to your LiteLLM proxy configuration:
+
+```yaml title="config.yaml"
+model_list:
+  - model_name: gpt-4
+    litellm_params:
+      model: gpt-4
+      api_key: os.environ/OPENAI_API_KEY
+
+# Add Helicone callback
+litellm_settings:
+  success_callback: ["helicone"]
+  
+# Set Helicone API key
+environment_variables:
+  HELICONE_API_KEY: "your-helicone-key"
+```
+
+Start the proxy:
+```bash
+litellm --config config.yaml
+```
+
+</TabItem>
+</Tabs>
+
+## Integration Methods
+
+There are two main approaches to integrate Helicone with LiteLLM:
+
+1. **Callbacks**: Log to Helicone while using any provider
+2. **Proxy Mode**: Use Helicone as a proxy for advanced features
 
 ### Supported LLM Providers
 
@@ -26,27 +88,16 @@ Helicone can log requests across [various LLM providers](https://docs.helicone.a
 - Replicate
 - And more
 
-### Integration Methods
-
-There are two main approaches to integrate Helicone with LiteLLM:
-
-1. Using callbacks
-2. Using Helicone as a proxy
+## Method 1: Using Callbacks
 
-Let's explore each method in detail.
+Log requests to Helicone while using any LLM provider directly.
 
-### Approach 1: Use Callbacks
-
-Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
-
-```python
-litellm.success_callback = ["helicone"]
-```
-
-Complete Code
+<Tabs>
+<TabItem value="sdk" label="Python SDK">
 
 ```python
 import os
+import litellm
 from litellm import completion
 
 ## Set env variables
@@ -66,28 +117,78 @@ response = completion(
 print(response)
 ```
 
-### Approach 2: Use Helicone as a proxy
+</TabItem>
+<TabItem value="proxy" label="LiteLLM Proxy">
+
+```yaml title="config.yaml"
+model_list:
+  - model_name: gpt-4
+    litellm_params:
+      model: gpt-4
+      api_key: os.environ/OPENAI_API_KEY
+  - model_name: claude-3
+    litellm_params:
+      model: anthropic/claude-3-sonnet-20240229
+      api_key: os.environ/ANTHROPIC_API_KEY
+
+# Add Helicone logging
+litellm_settings:
+  success_callback: ["helicone"]
+  
+# Environment variables
+environment_variables:
+  HELICONE_API_KEY: "your-helicone-key"
+  OPENAI_API_KEY: "your-openai-key"
+  ANTHROPIC_API_KEY: "your-anthropic-key"
+```
 
-Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
+Start the proxy:
+```bash
+litellm --config config.yaml
+```
+
+Make requests to your proxy:
+```python
+import openai
+
+client = openai.OpenAI(
+    api_key="anything",  # proxy doesn't require real API key
+    base_url="http://localhost:4000"
+)
+
+response = client.chat.completions.create(
+    model="gpt-4",  # This gets logged to Helicone
+    messages=[{"role": "user", "content": "Hello!"}]
+)
+```
 
-To use Helicone as a proxy for your LLM requests:
+</TabItem>
+</Tabs>
 
-1. Set Helicone as your base URL via: litellm.api_base
-2. Pass in Helicone request headers via: litellm.metadata
+## Method 2: Using Helicone as a Proxy
 
-Complete Code:
+Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
+
+<Tabs>
+<TabItem value="sdk" label="Python SDK">
+
+Set Helicone as your base URL and pass authentication headers:
 
 ```python
 import os
 import litellm
 from litellm import completion
 
+# Configure LiteLLM to use Helicone proxy
 litellm.api_base = "https://oai.hconeai.com/v1"
 litellm.headers = {
-    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",  # Authenticate to send requests to Helicone API
+    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
 }
 
-response = litellm.completion(
+# Set your OpenAI API key
+os.environ["OPENAI_API_KEY"] = "your-openai-key"
+
+response = completion(
     model="gpt-3.5-turbo",
     messages=[{"role": "user", "content": "How does a court case get to the Supreme Court?"}]
 )
@@ -140,32 +241,112 @@ litellm.metadata = {
 
 Track multi-step and agentic LLM interactions using session IDs and paths:
 
+<Tabs>
+<TabItem value="sdk" label="Python SDK">
+
 ```python
+import litellm
+
+litellm.api_base = "https://oai.hconeai.com/v1"
 litellm.metadata = {
-    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",  # Authenticate to send requests to Helicone API
-    "Helicone-Session-Id": "session-abc-123",  # The session ID you want to track
-    "Helicone-Session-Path": "parent-trace/child-trace",  # The path of the session
+    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
+    "Helicone-Session-Id": "session-abc-123",
+    "Helicone-Session-Path": "parent-trace/child-trace",
 }
+
+response = litellm.completion(
+    model="gpt-3.5-turbo",
+    messages=[{"role": "user", "content": "Start a conversation"}]
+)
 ```
 
-- `Helicone-Session-Id`: Use this to specify the unique identifier for the session you want to track. This allows you to group related requests together.
-- `Helicone-Session-Path`: This header defines the path of the session, allowing you to represent parent and child traces. For example, "parent/child" represents a child trace of a parent trace.
+</TabItem>
+<TabItem value="proxy" label="LiteLLM Proxy">
 
-By using these two headers, you can effectively group and visualize multi-step LLM interactions, gaining insights into complex AI workflows.
+```python
+import openai
 
-### Retry and Fallback Mechanisms
+client = openai.OpenAI(
+    api_key="anything",
+    base_url="http://localhost:4000"
+)
+
+# First request in session
+response1 = client.chat.completions.create(
+    model="gpt-4",
+    messages=[{"role": "user", "content": "Hello"}],
+    extra_headers={
+        "Helicone-Session-Id": "session-abc-123",
+        "Helicone-Session-Path": "conversation/greeting"
+    }
+)
+
+# Follow-up request in same session
+response2 = client.chat.completions.create(
+    model="gpt-4",
+    messages=[{"role": "user", "content": "Tell me more"}],
+    extra_headers={
+        "Helicone-Session-Id": "session-abc-123",
+        "Helicone-Session-Path": "conversation/follow-up"
+    }
+)
+```
 
-Set up retry mechanisms and fallback options:
+</TabItem>
+</Tabs>
+
+- `Helicone-Session-Id`: Unique identifier for the session to group related requests
+- `Helicone-Session-Path`: Hierarchical path to represent parent/child traces (e.g., "parent/child")
+
+## Retry and Fallback Mechanisms
+
+<Tabs>
+<TabItem value="sdk" label="Python SDK">
 
 ```python
+import litellm
+
+litellm.api_base = "https://oai.hconeai.com/v1"
 litellm.metadata = {
-    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",  # Authenticate to send requests to Helicone API
-    "Helicone-Retry-Enabled": "true",  # Enable retry mechanism
-    "helicone-retry-num": "3",  # Set number of retries
-    "helicone-retry-factor": "2",  # Set exponential backoff factor
-    "Helicone-Fallbacks": '["gpt-3.5-turbo", "gpt-4"]',  # Set fallback models
+    "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
+    "Helicone-Retry-Enabled": "true",
+    "helicone-retry-num": "3",
+    "helicone-retry-factor": "2",  # Exponential backoff
+    "Helicone-Fallbacks": '["gpt-3.5-turbo", "gpt-4"]',
 }
+
+response = litellm.completion(
+    model="gpt-4",
+    messages=[{"role": "user", "content": "Hello"}]
+)
 ```
 
+</TabItem>
+<TabItem value="proxy" label="LiteLLM Proxy">
+
+```yaml title="config.yaml"
+model_list:
+  - model_name: gpt-4
+    litellm_params:
+      model: gpt-4
+      api_key: os.environ/OPENAI_API_KEY
+      api_base: "https://oai.hconeai.com/v1"
+
+default_litellm_params:
+  headers:
+    Helicone-Auth: "Bearer ${HELICONE_API_KEY}"
+    Helicone-Retry-Enabled: "true"
+    helicone-retry-num: "3"
+    helicone-retry-factor: "2"
+    Helicone-Fallbacks: '["gpt-3.5-turbo", "gpt-4"]'
+
+environment_variables:
+  HELICONE_API_KEY: "your-helicone-key"
+  OPENAI_API_KEY: "your-openai-key"
+```
+
+</TabItem>
+</Tabs>
+
 > **Supported Headers** - For a full list of supported Helicone headers and their descriptions, please refer to the [Helicone documentation](https://docs.helicone.ai/getting-started/quick-start).
 > By utilizing these headers and metadata options, you can gain deeper insights into your LLM usage, optimize performance, and better manage your AI workflows with Helicone and LiteLLM.
@@ -93,6 +93,8 @@ callback_settings:
 
 general_settings:
   completion_model: string
+  store_prompts_in_spend_logs: boolean
+  forward_client_headers_to_llm_api: boolean
   disable_spend_logs: boolean  # turn off writing each transaction to the db
   disable_master_key_return: boolean  # turn off returning master key on UI (checked on '/user/info' endpoint)
   disable_retry_on_max_parallel_request_limit_error: boolean  # turn off retries when max parallel request limit is reached
@@ -121,6 +123,35 @@ general_settings:
   alerting: ["slack", "email"]
   alerting_threshold: 0
   use_client_credentials_pass_through_routes: boolean  # use client credentials for all pass through routes like "/vertex-ai", /bedrock/. When this is True Virtual Key auth will not be applied on these endpoints
+
+router_settings:
+  routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle" - RECOMMENDED for best performance
+  redis_host: <your-redis-host>           # string
+  redis_password: <your-redis-password>   # string
+  redis_port: <your-redis-port>           # string
+  enable_pre_call_checks: true            # bool - Before call is made check if a call is within model context window 
+  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 
+  cooldown_time: 30 # (in seconds) how long to cooldown model if fails/min > allowed_fails
+  disable_cooldowns: True                  # bool - Disable cooldowns for all models 
+  enable_tag_filtering: True                # bool - Use tag based routing for requests
+  retry_policy: {                          # Dict[str, int]: retry policy for different types of exceptions
+    "AuthenticationErrorRetries": 3,
+    "TimeoutErrorRetries": 3,
+    "RateLimitErrorRetries": 3,
+    "ContentPolicyViolationErrorRetries": 4,
+    "InternalServerErrorRetries": 4
+  }
+  allowed_fails_policy: {
+    "BadRequestErrorAllowedFails": 1000, # Allow 1000 BadRequestErrors before cooling down a deployment
+    "AuthenticationErrorAllowedFails": 10, # int 
+    "TimeoutErrorAllowedFails": 12, # int 
+    "RateLimitErrorAllowedFails": 10000, # int 
+    "ContentPolicyViolationErrorAllowedFails": 15, # int 
+    "InternalServerErrorAllowedFails": 20, # int 
+  }
+  content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for content policy violations
+  fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for all errors
+
 ```
 
 ### litellm_settings - Reference