You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Helicone](https://helicone.ai/) is an open source observability platform that proxies your LLM requests and provides key insights into your usage, spend, latency and more.
11
14
12
-
## Using Helicone with LiteLLM
15
+
## Quick Start
16
+
17
+
<Tabs>
18
+
<TabItemvalue="sdk"label="Python SDK">
19
+
20
+
Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
LiteLLM provides `success_callbacks` and `failure_callbacks`, allowing you to easily log data to Helicone based on the status of your responses.
39
+
print(response)
40
+
```
41
+
42
+
</TabItem>
43
+
<TabItemvalue="proxy"label="LiteLLM Proxy">
44
+
45
+
Add Helicone to your LiteLLM proxy configuration:
46
+
47
+
```yaml title="config.yaml"
48
+
model_list:
49
+
- model_name: gpt-4
50
+
litellm_params:
51
+
model: gpt-4
52
+
api_key: os.environ/OPENAI_API_KEY
53
+
54
+
# Add Helicone callback
55
+
litellm_settings:
56
+
success_callback: ["helicone"]
57
+
58
+
# Set Helicone API key
59
+
environment_variables:
60
+
HELICONE_API_KEY: "your-helicone-key"
61
+
```
62
+
63
+
Start the proxy:
64
+
```bash
65
+
litellm --config config.yaml
66
+
```
67
+
68
+
</TabItem>
69
+
</Tabs>
70
+
71
+
## Integration Methods
72
+
73
+
There are two main approaches to integrate Helicone with LiteLLM:
74
+
75
+
1.**Callbacks**: Log to Helicone while using any provider
76
+
2.**Proxy Mode**: Use Helicone as a proxy for advanced features
15
77
16
78
### Supported LLM Providers
17
79
@@ -26,27 +88,16 @@ Helicone can log requests across [various LLM providers](https://docs.helicone.a
26
88
- Replicate
27
89
- And more
28
90
29
-
### Integration Methods
30
-
31
-
There are two main approaches to integrate Helicone with LiteLLM:
32
-
33
-
1. Using callbacks
34
-
2. Using Helicone as a proxy
91
+
## Method 1: Using Callbacks
35
92
36
-
Let's explore each method in detail.
93
+
Log requests to Helicone while using any LLM provider directly.
37
94
38
-
### Approach 1: Use Callbacks
39
-
40
-
Use just 1 line of code to instantly log your responses **across all providers** with Helicone:
41
-
42
-
```python
43
-
litellm.success_callback = ["helicone"]
44
-
```
45
-
46
-
Complete Code
95
+
<Tabs>
96
+
<TabItemvalue="sdk"label="Python SDK">
47
97
48
98
```python
49
99
import os
100
+
import litellm
50
101
from litellm import completion
51
102
52
103
## Set env variables
@@ -66,28 +117,78 @@ response = completion(
66
117
print(response)
67
118
```
68
119
69
-
### Approach 2: Use Helicone as a proxy
120
+
</TabItem>
121
+
<TabItemvalue="proxy"label="LiteLLM Proxy">
122
+
123
+
```yaml title="config.yaml"
124
+
model_list:
125
+
- model_name: gpt-4
126
+
litellm_params:
127
+
model: gpt-4
128
+
api_key: os.environ/OPENAI_API_KEY
129
+
- model_name: claude-3
130
+
litellm_params:
131
+
model: anthropic/claude-3-sonnet-20240229
132
+
api_key: os.environ/ANTHROPIC_API_KEY
133
+
134
+
# Add Helicone logging
135
+
litellm_settings:
136
+
success_callback: ["helicone"]
137
+
138
+
# Environment variables
139
+
environment_variables:
140
+
HELICONE_API_KEY: "your-helicone-key"
141
+
OPENAI_API_KEY: "your-openai-key"
142
+
ANTHROPIC_API_KEY: "your-anthropic-key"
143
+
```
70
144
71
-
Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
145
+
Start the proxy:
146
+
```bash
147
+
litellm --config config.yaml
148
+
```
149
+
150
+
Make requests to your proxy:
151
+
```python
152
+
import openai
153
+
154
+
client = openai.OpenAI(
155
+
api_key="anything", # proxy doesn't require real API key
156
+
base_url="http://localhost:4000"
157
+
)
158
+
159
+
response = client.chat.completions.create(
160
+
model="gpt-4", # This gets logged to Helicone
161
+
messages=[{"role": "user", "content": "Hello!"}]
162
+
)
163
+
```
72
164
73
-
To use Helicone as a proxy for your LLM requests:
165
+
</TabItem>
166
+
</Tabs>
74
167
75
-
1. Set Helicone as your base URL via: litellm.api_base
76
-
2. Pass in Helicone request headers via: litellm.metadata
168
+
## Method 2: Using Helicone as a Proxy
77
169
78
-
Complete Code:
170
+
Helicone's proxy provides [advanced functionality](https://docs.helicone.ai/getting-started/proxy-vs-async) like caching, rate limiting, LLM security through [PromptArmor](https://promptarmor.com/) and more.
171
+
172
+
<Tabs>
173
+
<TabItemvalue="sdk"label="Python SDK">
174
+
175
+
Set Helicone as your base URL and pass authentication headers:
79
176
80
177
```python
81
178
import os
82
179
import litellm
83
180
from litellm import completion
84
181
182
+
# Configure LiteLLM to use Helicone proxy
85
183
litellm.api_base ="https://oai.hconeai.com/v1"
86
184
litellm.headers = {
87
-
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",# Authenticate to send requests to Helicone API
messages=[{"role": "user", "content": "Start a conversation"}]
260
+
)
149
261
```
150
262
151
-
-`Helicone-Session-Id`: Use this to specify the unique identifier for the session you want to track. This allows you to group related requests together.
152
-
-`Helicone-Session-Path`: This header defines the path of the session, allowing you to represent parent and child traces. For example, "parent/child" represents a child trace of a parent trace.
263
+
</TabItem>
264
+
<TabItemvalue="proxy"label="LiteLLM Proxy">
153
265
154
-
By using these two headers, you can effectively group and visualize multi-step LLM interactions, gaining insights into complex AI workflows.
266
+
```python
267
+
import openai
155
268
156
-
### Retry and Fallback Mechanisms
269
+
client = openai.OpenAI(
270
+
api_key="anything",
271
+
base_url="http://localhost:4000"
272
+
)
273
+
274
+
# First request in session
275
+
response1 = client.chat.completions.create(
276
+
model="gpt-4",
277
+
messages=[{"role": "user", "content": "Hello"}],
278
+
extra_headers={
279
+
"Helicone-Session-Id": "session-abc-123",
280
+
"Helicone-Session-Path": "conversation/greeting"
281
+
}
282
+
)
283
+
284
+
# Follow-up request in same session
285
+
response2 = client.chat.completions.create(
286
+
model="gpt-4",
287
+
messages=[{"role": "user", "content": "Tell me more"}],
288
+
extra_headers={
289
+
"Helicone-Session-Id": "session-abc-123",
290
+
"Helicone-Session-Path": "conversation/follow-up"
291
+
}
292
+
)
293
+
```
157
294
158
-
Set up retry mechanisms and fallback options:
295
+
</TabItem>
296
+
</Tabs>
297
+
298
+
-`Helicone-Session-Id`: Unique identifier for the session to group related requests
299
+
-`Helicone-Session-Path`: Hierarchical path to represent parent/child traces (e.g., "parent/child")
300
+
301
+
## Retry and Fallback Mechanisms
302
+
303
+
<Tabs>
304
+
<TabItemvalue="sdk"label="Python SDK">
159
305
160
306
```python
307
+
import litellm
308
+
309
+
litellm.api_base ="https://oai.hconeai.com/v1"
161
310
litellm.metadata = {
162
-
"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",# Authenticate to send requests to Helicone API
> **Supported Headers** - For a full list of supported Helicone headers and their descriptions, please refer to the [Helicone documentation](https://docs.helicone.ai/getting-started/quick-start).
171
352
> By utilizing these headers and metadata options, you can gain deeper insights into your LLM usage, optimize performance, and better manage your AI workflows with Helicone and LiteLLM.
Copy file name to clipboardExpand all lines: docs/my-website/docs/proxy/config_settings.md
+31Lines changed: 31 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,6 +93,8 @@ callback_settings:
93
93
94
94
general_settings:
95
95
completion_model: string
96
+
store_prompts_in_spend_logs: boolean
97
+
forward_client_headers_to_llm_api: boolean
96
98
disable_spend_logs: boolean # turn off writing each transaction to the db
97
99
disable_master_key_return: boolean # turn off returning master key on UI (checked on '/user/info' endpoint)
98
100
disable_retry_on_max_parallel_request_limit_error: boolean # turn off retries when max parallel request limit is reached
@@ -121,6 +123,35 @@ general_settings:
121
123
alerting: ["slack", "email"]
122
124
alerting_threshold: 0
123
125
use_client_credentials_pass_through_routes: boolean # use client credentials for all pass through routes like "/vertex-ai", /bedrock/. When this is True Virtual Key auth will not be applied on these endpoints
126
+
127
+
router_settings:
128
+
routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle" - RECOMMENDED for best performance
129
+
redis_host: <your-redis-host> # string
130
+
redis_password: <your-redis-password> # string
131
+
redis_port: <your-redis-port> # string
132
+
enable_pre_call_checks: true # bool - Before call is made check if a call is within model context window
133
+
allowed_fails: 3# cooldown model if it fails > 1 call in a minute.
134
+
cooldown_time: 30# (in seconds) how long to cooldown model if fails/min > allowed_fails
135
+
disable_cooldowns: True # bool - Disable cooldowns for all models
136
+
enable_tag_filtering: True # bool - Use tag based routing for requests
137
+
retry_policy: { # Dict[str, int]: retry policy for different types of exceptions
138
+
"AuthenticationErrorRetries": 3,
139
+
"TimeoutErrorRetries": 3,
140
+
"RateLimitErrorRetries": 3,
141
+
"ContentPolicyViolationErrorRetries": 4,
142
+
"InternalServerErrorRetries": 4
143
+
}
144
+
allowed_fails_policy: {
145
+
"BadRequestErrorAllowedFails": 1000,# Allow 1000 BadRequestErrors before cooling down a deployment
146
+
"AuthenticationErrorAllowedFails": 10,# int
147
+
"TimeoutErrorAllowedFails": 12,# int
148
+
"RateLimitErrorAllowedFails": 10000,# int
149
+
"ContentPolicyViolationErrorAllowedFails": 15,# int
150
+
"InternalServerErrorAllowedFails": 20,# int
151
+
}
152
+
content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for content policy violations
153
+
fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for all errors
0 commit comments