Skip to content

Commit a2d5bde

Browse files
changed docs
1 parent c20a5b0 commit a2d5bde

File tree

4 files changed

+335
-40
lines changed

4 files changed

+335
-40
lines changed

docs/my-website/docs/observability/helicone_integration.md

Lines changed: 220 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
import Tabs from '@theme/Tabs';
2+
import TabItem from '@theme/TabItem';
3+
14
# Helicone - OSS LLM Observability Platform
25

36
:::tip
@@ -9,9 +12,68 @@ https://github.com/BerriAI/litellm
912

1013
[Helicone](https://helicone.ai/) is an open source observability platform that proxies your LLM requests and provides key insights into your usage, spend, latency and more.
1114

12-
## Using Helicone with LiteLLM
15+
## Quick Start
16+
17+
<Tabs>
18+
<TabItem value="sdk" label="Python SDK">
19+
20+
Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
21+
22+
```python
23+
import os
24+
from litellm import completion
25+
26+
## Set env variables
27+
os.environ["HELICONE_API_KEY"] = "your-helicone-key"
28+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
29+
30+
# Set callbacks
31+
litellm.success_callback = ["helicone"]
32+
33+
# OpenAI call
34+
response = completion(
35+
model="gpt-4o",
36+
messages=[{"role": "user", "content": "Hi 👋 - I'm OpenAI"}],
37+
)
1338

14-
LiteLLM provides `success_callbacks` and `failure_callbacks`, allowing you to easily log data to Helicone based on the status of your responses.
39+
print(response)
40+
```
41+
42+
</TabItem>
43+
<TabItem value="proxy" label="LiteLLM Proxy">
44+
45+
Add Helicone to your LiteLLM proxy configuration:
46+
47+
```yaml title="config.yaml"
48+
model_list:
49+
- model_name: gpt-4
50+
litellm_params:
51+
model: gpt-4
52+
api_key: os.environ/OPENAI_API_KEY
53+
54+
# Add Helicone callback
55+
litellm_settings:
56+
success_callback: ["helicone"]
57+
58+
# Set Helicone API key
59+
environment_variables:
60+
HELICONE_API_KEY: "your-helicone-key"
61+
```
62+
63+
Start the proxy:
64+
```bash
65+
litellm --config config.yaml
66+
```
67+
68+
</TabItem>
69+
</Tabs>
70+
71+
## Integration Methods
72+
73+
There are two main approaches to integrate Helicone with LiteLLM:
74+
75+
1. **Callbacks**: Log to Helicone while using any provider
76+
2. **Proxy Mode**: Use Helicone as a proxy for advanced features
1577

1678
### Supported LLM Providers
1779

@@ -26,27 +88,16 @@ Helicone can log requests across [various LLM providers](https://docs.helicone.a
2688
- Replicate
2789
- And more
2890

29-
### Integration Methods
30-
31-
There are two main approaches to integrate Helicone with LiteLLM:
32-
33-
1. Using callbacks
34-
2. Using Helicone as a proxy
91+
## Method 1: Using Callbacks
3592

36-
Let's explore each method in detail.
93+
Log requests to Helicone while using any LLM provider directly.
3794

38-
### Approach 1: Use Callbacks
39-
40-
Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
41-
42-
```python
43-
litellm.success_callback = ["helicone"]
44-
```
45-
46-
Complete Code
95+
<Tabs>
96+
<TabItem value="sdk" label="Python SDK">
4797

4898
```python
4999
import os
100+
import litellm
50101
from litellm import completion
51102

52103
## Set env variables
@@ -66,28 +117,78 @@ response = completion(
66117
print(response)
67118
```
68119

69-
### Approach 2: Use Helicone as a proxy
120+
</TabItem>
121+
<TabItem value="proxy" label="LiteLLM Proxy">
122+
123+
```yaml title="config.yaml"
124+
model_list:
125+
- model_name: gpt-4
126+
litellm_params:
127+
model: gpt-4
128+
api_key: os.environ/OPENAI_API_KEY
129+
- model_name: claude-3
130+
litellm_params:
131+
model: anthropic/claude-3-sonnet-20240229
132+
api_key: os.environ/ANTHROPIC_API_KEY
133+
134+
# Add Helicone logging
135+
litellm_settings:
136+
success_callback: ["helicone"]
137+
138+
# Environment variables
139+
environment_variables:
140+
HELICONE_API_KEY: "your-helicone-key"
141+
OPENAI_API_KEY: "your-openai-key"
142+
ANTHROPIC_API_KEY: "your-anthropic-key"
143+
```
70144
71-
Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
145+
Start the proxy:
146+
```bash
147+
litellm --config config.yaml
148+
```
149+
150+
Make requests to your proxy:
151+
```python
152+
import openai
153+
154+
client = openai.OpenAI(
155+
api_key="anything", # proxy doesn't require real API key
156+
base_url="http://localhost:4000"
157+
)
158+
159+
response = client.chat.completions.create(
160+
model="gpt-4", # This gets logged to Helicone
161+
messages=[{"role": "user", "content": "Hello!"}]
162+
)
163+
```
72164

73-
To use Helicone as a proxy for your LLM requests:
165+
</TabItem>
166+
</Tabs>
74167

75-
1. Set Helicone as your base URL via: litellm.api_base
76-
2. Pass in Helicone request headers via: litellm.metadata
168+
## Method 2: Using Helicone as a Proxy
77169

78-
Complete Code:
170+
Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
171+
172+
<Tabs>
173+
<TabItem value="sdk" label="Python SDK">
174+
175+
Set Helicone as your base URL and pass authentication headers:
79176

80177
```python
81178
import os
82179
import litellm
83180
from litellm import completion
84181

182+
# Configure LiteLLM to use Helicone proxy
85183
litellm.api_base = "https://oai.hconeai.com/v1"
86184
litellm.headers = {
87-
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}", # Authenticate to send requests to Helicone API
185+
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
88186
}
89187

90-
response = litellm.completion(
188+
# Set your OpenAI API key
189+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
190+
191+
response = completion(
91192
model="gpt-3.5-turbo",
92193
messages=[{"role": "user", "content": "How does a court case get to the Supreme Court?"}]
93194
)
@@ -140,32 +241,112 @@ litellm.metadata = {
140241

141242
Track multi-step and agentic LLM interactions using session IDs and paths:
142243

244+
<Tabs>
245+
<TabItem value="sdk" label="Python SDK">
246+
143247
```python
248+
import litellm
249+
250+
litellm.api_base = "https://oai.hconeai.com/v1"
144251
litellm.metadata = {
145-
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}", # Authenticate to send requests to Helicone API
146-
"Helicone-Session-Id": "session-abc-123", # The session ID you want to track
147-
"Helicone-Session-Path": "parent-trace/child-trace", # The path of the session
252+
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
253+
"Helicone-Session-Id": "session-abc-123",
254+
"Helicone-Session-Path": "parent-trace/child-trace",
148255
}
256+
257+
response = litellm.completion(
258+
model="gpt-3.5-turbo",
259+
messages=[{"role": "user", "content": "Start a conversation"}]
260+
)
149261
```
150262

151-
- `Helicone-Session-Id`: Use this to specify the unique identifier for the session you want to track. This allows you to group related requests together.
152-
- `Helicone-Session-Path`: This header defines the path of the session, allowing you to represent parent and child traces. For example, "parent/child" represents a child trace of a parent trace.
263+
</TabItem>
264+
<TabItem value="proxy" label="LiteLLM Proxy">
153265

154-
By using these two headers, you can effectively group and visualize multi-step LLM interactions, gaining insights into complex AI workflows.
266+
```python
267+
import openai
155268

156-
### Retry and Fallback Mechanisms
269+
client = openai.OpenAI(
270+
api_key="anything",
271+
base_url="http://localhost:4000"
272+
)
273+
274+
# First request in session
275+
response1 = client.chat.completions.create(
276+
model="gpt-4",
277+
messages=[{"role": "user", "content": "Hello"}],
278+
extra_headers={
279+
"Helicone-Session-Id": "session-abc-123",
280+
"Helicone-Session-Path": "conversation/greeting"
281+
}
282+
)
283+
284+
# Follow-up request in same session
285+
response2 = client.chat.completions.create(
286+
model="gpt-4",
287+
messages=[{"role": "user", "content": "Tell me more"}],
288+
extra_headers={
289+
"Helicone-Session-Id": "session-abc-123",
290+
"Helicone-Session-Path": "conversation/follow-up"
291+
}
292+
)
293+
```
157294

158-
Set up retry mechanisms and fallback options:
295+
</TabItem>
296+
</Tabs>
297+
298+
- `Helicone-Session-Id`: Unique identifier for the session to group related requests
299+
- `Helicone-Session-Path`: Hierarchical path to represent parent/child traces (e.g., "parent/child")
300+
301+
## Retry and Fallback Mechanisms
302+
303+
<Tabs>
304+
<TabItem value="sdk" label="Python SDK">
159305

160306
```python
307+
import litellm
308+
309+
litellm.api_base = "https://oai.hconeai.com/v1"
161310
litellm.metadata = {
162-
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}", # Authenticate to send requests to Helicone API
163-
"Helicone-Retry-Enabled": "true", # Enable retry mechanism
164-
"helicone-retry-num": "3", # Set number of retries
165-
"helicone-retry-factor": "2", # Set exponential backoff factor
166-
"Helicone-Fallbacks": '["gpt-3.5-turbo", "gpt-4"]', # Set fallback models
311+
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
312+
"Helicone-Retry-Enabled": "true",
313+
"helicone-retry-num": "3",
314+
"helicone-retry-factor": "2", # Exponential backoff
315+
"Helicone-Fallbacks": '["gpt-3.5-turbo", "gpt-4"]',
167316
}
317+
318+
response = litellm.completion(
319+
model="gpt-4",
320+
messages=[{"role": "user", "content": "Hello"}]
321+
)
168322
```
169323

324+
</TabItem>
325+
<TabItem value="proxy" label="LiteLLM Proxy">
326+
327+
```yaml title="config.yaml"
328+
model_list:
329+
- model_name: gpt-4
330+
litellm_params:
331+
model: gpt-4
332+
api_key: os.environ/OPENAI_API_KEY
333+
api_base: "https://oai.hconeai.com/v1"
334+
335+
default_litellm_params:
336+
headers:
337+
Helicone-Auth: "Bearer ${HELICONE_API_KEY}"
338+
Helicone-Retry-Enabled: "true"
339+
helicone-retry-num: "3"
340+
helicone-retry-factor: "2"
341+
Helicone-Fallbacks: '["gpt-3.5-turbo", "gpt-4"]'
342+
343+
environment_variables:
344+
HELICONE_API_KEY: "your-helicone-key"
345+
OPENAI_API_KEY: "your-openai-key"
346+
```
347+
348+
</TabItem>
349+
</Tabs>
350+
170351
> **Supported Headers** - For a full list of supported Helicone headers and their descriptions, please refer to the [Helicone documentation](https://docs.helicone.ai/getting-started/quick-start).
171352
> By utilizing these headers and metadata options, you can gain deeper insights into your LLM usage, optimize performance, and better manage your AI workflows with Helicone and LiteLLM.

docs/my-website/docs/proxy/config_settings.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,8 @@ callback_settings:
9393

9494
general_settings:
9595
completion_model: string
96+
store_prompts_in_spend_logs: boolean
97+
forward_client_headers_to_llm_api: boolean
9698
disable_spend_logs: boolean # turn off writing each transaction to the db
9799
disable_master_key_return: boolean # turn off returning master key on UI (checked on '/user/info' endpoint)
98100
disable_retry_on_max_parallel_request_limit_error: boolean # turn off retries when max parallel request limit is reached
@@ -121,6 +123,35 @@ general_settings:
121123
alerting: ["slack", "email"]
122124
alerting_threshold: 0
123125
use_client_credentials_pass_through_routes: boolean # use client credentials for all pass through routes like "/vertex-ai", /bedrock/. When this is True Virtual Key auth will not be applied on these endpoints
126+
127+
router_settings:
128+
routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle" - RECOMMENDED for best performance
129+
redis_host: <your-redis-host> # string
130+
redis_password: <your-redis-password> # string
131+
redis_port: <your-redis-port> # string
132+
enable_pre_call_checks: true # bool - Before call is made check if a call is within model context window
133+
allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
134+
cooldown_time: 30 # (in seconds) how long to cooldown model if fails/min > allowed_fails
135+
disable_cooldowns: True # bool - Disable cooldowns for all models
136+
enable_tag_filtering: True # bool - Use tag based routing for requests
137+
retry_policy: { # Dict[str, int]: retry policy for different types of exceptions
138+
"AuthenticationErrorRetries": 3,
139+
"TimeoutErrorRetries": 3,
140+
"RateLimitErrorRetries": 3,
141+
"ContentPolicyViolationErrorRetries": 4,
142+
"InternalServerErrorRetries": 4
143+
}
144+
allowed_fails_policy: {
145+
"BadRequestErrorAllowedFails": 1000, # Allow 1000 BadRequestErrors before cooling down a deployment
146+
"AuthenticationErrorAllowedFails": 10, # int
147+
"TimeoutErrorAllowedFails": 12, # int
148+
"RateLimitErrorAllowedFails": 10000, # int
149+
"ContentPolicyViolationErrorAllowedFails": 15, # int
150+
"InternalServerErrorAllowedFails": 20, # int
151+
}
152+
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for content policy violations
153+
fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for all errors
154+
124155
```
125156

126157
### litellm_settings - Reference

0 commit comments

Comments
 (0)