Skip to content

Commit d1dddb3

Browse files
authored
feat(platform): export OTLP traces (#870)
## Description - Replace manual platform usage-event posting with OpenTelemetry SDK OTLP/HTTP trace export for platform completions (including MZAI) while keeping payloads prompt/response-free. - Add `session_label` passthrough in SDK/API and export it as trace metadata (`anyllm.user_session_label`) alongside platform-generated trace session labels. - Add scoped forwarding + sanitization guards for active trace exports (token-scoped forwarding processor, content-attribute redaction, secure endpoint enforcement), and expand unit coverage for edge/error paths. - Update platform docs to describe trace analytics/redaction behavior and expected session-label semantics. ## PR Type - 🆕 New Feature - 📚 Documentation ## Relevant issues <!-- e.g. "Fixes #123" --> ## Checklist - [x] I understand the code I am submitting. - [x] I have added unit tests that prove my fix/feature works - [x] I have run this code locally and verified it fixes the issue. - [x] New and existing tests pass locally - [x] Documentation was updated where necessary - [x] I have read and followed the [contribution guidelines](https://github.com/mozilla-ai/any-llm/blob/main/CONTRIBUTING.md) - [x] **AI Usage:** - [ ] No AI was used. - [x] AI was used for drafting/refactoring. - [ ] This is fully AI-generated. ## AI Usage Information - AI Model used: GPT-5 - AI Developer Tool used: Codex (OpenAI) - Any other info you'd like to share: - [x] I am an AI Agent filling out this form (check box if true)
1 parent fcf925a commit d1dddb3

File tree

10 files changed

+1866
-463
lines changed

10 files changed

+1866
-463
lines changed

docs/src/content/docs/platform/overview.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,19 @@ The managed platform solves these problems:
2222

2323
- **Secure Key Vault**: Your provider API keys are encrypted client-side before storage—we never see your raw keys
2424
- **Single Virtual Key**: One `ANY_LLM_KEY` works across all providers
25-
- **Usage Analytics**: Track tokens, costs, and performance metrics without logging prompts or responses
25+
- **Trace Analytics**: Track tokens, costs, and performance metrics without logging prompts or responses
2626
- **Zero Infrastructure**: No servers to deploy, no databases to manage
2727

2828
## How it works
2929

30-
The managed platform acts as a secure credential manager and usage tracker. Here's the flow:
30+
The managed platform acts as a secure credential manager and trace-based usage tracker. Here's the flow:
3131

3232
1. **You add provider keys** to the platform dashboard (keys are encrypted in your browser before upload)
3333
2. **You get a virtual key** (`ANY_LLM_KEY`) that represents your project
3434
3. **Your application** uses the `PlatformProvider` with your virtual key
3535
4. **The SDK** authenticates with the platform, retrieves and decrypts your provider key client-side
3636
5. **Your request** goes directly to the LLM provider (OpenAI, Anthropic, etc.)
37-
6. **Usage metadata** (tokens, model, latency) is reported back—never your prompts or responses
37+
6. **OpenTelemetry spans produced during each platform-provider call** are reported back for analytics, with prompt/response content attributes redacted before export
3838

3939
```
4040
┌─────────────────────────────────────────────────────────────────────────┐
@@ -52,15 +52,15 @@ The managed platform acts as a secure credential manager and usage tracker. Here
5252
│ 2. Receive encrypted provider key │
5353
│ 3. Decrypt provider key locally (client-side) │
5454
│ 4. Make request directly to provider │
55-
│ 5. Report usage metadata (tokens, latency) to platform
55+
│ 5. Report in-scope OTel spans (with content redaction) to platform │
5656
└────────────────┬─────────────────────────────────────┬──────────────────┘
5757
│ │
5858
▼ ▼
5959
┌─────────────────────────────┐ ┌────────────────────────────────────┐
6060
│ any-llm Managed Platform │ │ LLM Provider │
6161
│ │ │ (OpenAI, Anthropic, etc.) │
6262
│ • Encrypted key storage │ │ │
63-
│ • Usage tracking │ │ Your prompts/responses go │
63+
│ • Trace tracking │ │ Your prompts/responses go │
6464
│ • Cost analytics │ │ directly here—never through │
6565
│ • Performance metrics │ │ our platform │
6666
└─────────────────────────────┘ └────────────────────────────────────┘
@@ -77,23 +77,26 @@ Your provider API keys are encrypted in your browser using XChaCha20-Poly1305 be
7777
- You maintain full control over your credentials
7878

7979

80-
### Privacy-First Usage Tracking
80+
### Privacy-First Trace Tracking
8181

82-
The platform tracks usage metadata to provide cost and performance insights:
82+
The platform tracks OpenTelemetry span data generated during each platform-provider request to provide cost and performance insights:
8383

8484
**What we track for you:**
8585

8686
- Token counts (input and output)
8787
- Model name and provider
8888
- Request timestamps
8989
- Performance metrics (latency, throughput)
90+
- Additional OpenTelemetry span attributes/events emitted in the same request scope
9091

9192
**What we never track:**
9293

9394
- Your prompts
9495
- Model responses
9596
- Any content from your conversations
9697

98+
Prompt/response payload attributes are removed from traces before export.
99+
97100
### Project Organization
98101

99102
Organize your usage by project, team, or environment:

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ all = [
2525

2626
platform = [
2727
"any-llm-platform-client>=0.3.0",
28+
"opentelemetry-sdk>=1.40.0",
29+
"opentelemetry-exporter-otlp-proto-http>=1.40.0",
2830
]
2931

3032
perplexity = []

src/any_llm/any_llm.py

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import os
66
import warnings
77
from abc import ABC, abstractmethod
8-
from typing import TYPE_CHECKING, Any, ClassVar, Literal, TypeVar, overload
8+
from typing import TYPE_CHECKING, Any, ClassVar, Literal, TypeVar, cast, overload
99

1010
from openresponses_types import ResponseResource
1111
from pydantic import BaseModel
@@ -23,15 +23,15 @@
2323
from any_llm.types.messages import MessageResponse, MessagesParams, MessageStreamEvent, MessageUsage
2424
from any_llm.types.provider import PlatformKey, ProviderMetadata
2525
from any_llm.types.responses import Response, ResponseInputParam, ResponsesParams, ResponseStreamEvent
26-
from any_llm.utils.aio import async_iter_to_sync_iter, run_async_in_sync
26+
from any_llm.utils.aio import async_coro_to_sync_iter, async_iter_to_sync_iter, run_async_in_sync
2727
from any_llm.utils.decorators import BATCH_API_EXPERIMENTAL_MESSAGE, experimental
2828
from any_llm.utils.exception_handler import handle_exceptions
2929
from any_llm.utils.structured_output import is_structured_output_type, parse_json_content
3030

3131
ResponseFormatT = TypeVar("ResponseFormatT", bound=BaseModel)
3232

3333
if TYPE_CHECKING:
34-
from collections.abc import AsyncIterator, Callable, Iterator, Sequence
34+
from collections.abc import AsyncIterator, Callable, Coroutine, Iterator, Sequence
3535

3636
from any_llm.types.batch import Batch
3737
from any_llm.types.completion import ChatCompletionChunk, CreateEmbeddingResponse
@@ -437,14 +437,26 @@ def completion(
437437
"""
438438
if allow_running_loop is None:
439439
allow_running_loop = INSIDE_NOTEBOOK
440+
if stream:
441+
return async_coro_to_sync_iter(
442+
self.acompletion(
443+
model=model,
444+
messages=messages,
445+
response_format=response_format,
446+
stream=stream,
447+
**kwargs,
448+
),
449+
allow_running_loop=allow_running_loop,
450+
)
451+
440452
response = run_async_in_sync(
441453
self.acompletion(model=model, messages=messages, response_format=response_format, stream=stream, **kwargs),
442454
allow_running_loop=allow_running_loop,
443455
)
444456
if isinstance(response, ChatCompletion):
445457
return response
446458

447-
return async_iter_to_sync_iter(response)
459+
return async_iter_to_sync_iter(response, allow_running_loop=allow_running_loop)
448460

449461
# Overloads let type checkers narrow the return type based on response_format and stream.
450462
@overload
@@ -509,6 +521,7 @@ async def acompletion(
509521
frequency_penalty: float | None = None,
510522
seed: int | None = None,
511523
user: str | None = None,
524+
session_label: str | None = None,
512525
parallel_tool_calls: bool | None = None,
513526
logprobs: bool | None = None,
514527
top_logprobs: int | None = None,
@@ -536,6 +549,7 @@ async def acompletion(
536549
frequency_penalty: Penalize new tokens based on frequency in text
537550
seed: Random seed for reproducible results
538551
user: Unique identifier for the end user
552+
session_label: Optional user session label metadata for platform traces; exported as anyllm.user_session_label
539553
parallel_tool_calls: Whether to allow parallel tool calls
540554
logprobs: Include token-level log probabilities in the response
541555
top_logprobs: Number of alternatives to return when logprobs are requested
@@ -586,6 +600,9 @@ async def acompletion(
586600
reasoning_effort=reasoning_effort,
587601
)
588602

603+
if session_label is not None and self.PROVIDER_NAME == "platform":
604+
kwargs["session_label"] = session_label
605+
589606
result = await self._acompletion(params, **kwargs)
590607

591608
if is_structured_output_type(response_format):
@@ -754,10 +771,16 @@ def responses(self, **kwargs: Any) -> ResponseResource | Response | Iterator[Res
754771
See [AnyLLM.aresponses][any_llm.any_llm.AnyLLM.aresponses]
755772
"""
756773
allow_running_loop = kwargs.pop("allow_running_loop", INSIDE_NOTEBOOK)
774+
if kwargs.get("stream"):
775+
return async_coro_to_sync_iter(
776+
cast("Coroutine[Any, Any, AsyncIterator[ResponseStreamEvent]]", self.aresponses(**kwargs)),
777+
allow_running_loop=allow_running_loop,
778+
)
779+
757780
response = run_async_in_sync(self.aresponses(**kwargs), allow_running_loop=allow_running_loop)
758781
if isinstance(response, (ResponseResource, Response)):
759782
return response
760-
return async_iter_to_sync_iter(response)
783+
return async_iter_to_sync_iter(response, allow_running_loop=allow_running_loop)
761784

762785
@handle_exceptions(wrap_streaming=True)
763786
async def aresponses(

src/any_llm/api.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ def completion(
3939
api_key: str | None = None,
4040
api_base: str | None = None,
4141
user: str | None = None,
42+
session_label: str | None = None,
4243
parallel_tool_calls: bool | None = None,
4344
logprobs: bool | None = None,
4445
top_logprobs: int | None = None,
@@ -73,6 +74,7 @@ def completion(
7374
api_key: API key for the provider
7475
api_base: Base URL for the provider API
7576
user: Unique identifier for the end user
77+
session_label: Optional user session label metadata for platform traces; exported as anyllm.user_session_label
7678
parallel_tool_calls: Whether to allow parallel tool calls
7779
logprobs: Include token-level log probabilities in the response
7880
top_logprobs: Number of alternatives to return when logprobs are requested
@@ -115,6 +117,7 @@ def completion(
115117
frequency_penalty=frequency_penalty,
116118
seed=seed,
117119
user=user,
120+
session_label=session_label,
118121
parallel_tool_calls=parallel_tool_calls,
119122
logprobs=logprobs,
120123
top_logprobs=top_logprobs,
@@ -146,6 +149,7 @@ async def acompletion(
146149
api_key: str | None = None,
147150
api_base: str | None = None,
148151
user: str | None = None,
152+
session_label: str | None = None,
149153
parallel_tool_calls: bool | None = None,
150154
logprobs: bool | None = None,
151155
top_logprobs: int | None = None,
@@ -180,6 +184,7 @@ async def acompletion(
180184
api_key: API key for the provider
181185
api_base: Base URL for the provider API
182186
user: Unique identifier for the end user
187+
session_label: Optional user session label metadata for platform traces; exported as anyllm.user_session_label
183188
parallel_tool_calls: Whether to allow parallel tool calls
184189
logprobs: Include token-level log probabilities in the response
185190
top_logprobs: Number of alternatives to return when logprobs are requested
@@ -222,6 +227,7 @@ async def acompletion(
222227
frequency_penalty=frequency_penalty,
223228
seed=seed,
224229
user=user,
230+
session_label=session_label,
225231
parallel_tool_calls=parallel_tool_calls,
226232
logprobs=logprobs,
227233
top_logprobs=top_logprobs,
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .platform import PlatformProvider
2-
from .utils import post_completion_usage_event
2+
from .utils import export_completion_trace, shutdown_telemetry
33

4-
__all__ = ["PlatformProvider", "post_completion_usage_event"]
4+
__all__ = ["PlatformProvider", "export_completion_trace", "shutdown_telemetry"]

0 commit comments

Comments
 (0)