LiteLLM allows developers to call all LLM APIs using the openAI format. LiteLLM Proxy is a proxy server to call 100+ LLMs in OpenAI format. Both are supported by this auto-instrumentation.
This package implements OpenInference tracing for the following LiteLLM functions:
- completion()
- acompletion()
- completion_with_retries()
- embedding()
- aembedding()
- image_generation()
- aimage_generation()
These traces are fully OpenTelemetry compatible and can be sent to an OpenTelemetry collector for viewing, such as Arize Phoenix.
pip install openinference-instrumentation-litellmIn a notebook environment (jupyter, colab, etc.) install openinference-instrumentation-litellm if you haven't already as well as arize-phoenix and litellm.
pip install openinference-instrumentation-litellm arize-phoenix litellmFirst, import dependencies required to autoinstrument liteLLM and set up phoenix as an collector for OpenInference traces.
import litellm
import phoenix as px
from openinference.instrumentation.litellm import LiteLLMInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessorNext, we'll start a phoenix server and set it as a collector.
session = px.launch_app()
endpoint = "http://127.0.0.1:6006/v1/traces"
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))Set up any API keys needed in you API calls. For example:
import os
os.environ["OPENAI_API_KEY"] = "PASTE_YOUR_API_KEY_HERE"Instrumenting LiteLLM is simple:
LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)Now, all calls to LiteLLM functions are instrumented and can be viewed in the phoenix UI.
completion_response = litellm.completion(model="gpt-3.5-turbo",
messages=[{"content": "What's the capital of China?", "role": "user"}])
print(completion_response)acompletion_response = await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, I want to bake a cake","role": "user"},
{ "content": "Hello, I can pull up some recipes for cakes.","role": "assistant"},
{ "content": "No actually I want to make a pie","role": "user"},],
temperature=0.7,
max_tokens=20
)
print(acompletion_response)embedding_response = litellm.embedding(model='text-embedding-ada-002', input=["good morning!"])
print(embedding_response)image_gen_response = litellm.image_generation(model='dall-e-2', prompt="cute baby otter")
print(image_gen_response)You can also uninstrument the functions as follows
LiteLLMInstrumentor().uninstrument(tracer_provider=tracer_provider)Now any liteLLM function calls you make will not send traces to Phoenix until instrumented again