Commit 6ca835b
committed
Extract LLM attributes from any OTEL convention, not just GenAI
## Summary
The span ingestion pipeline only recognized `gen_ai.*` attribute keys when extracting LLM-specific fields into promoted ClickHouse columns. Spans arriving via OpenInference, OpenLLMetry, Vercel AI SDK, or OpenAI Agents SDK had their LLM columns left empty despite carrying equivalent data under different keys and vocabularies.
This introduces a multi-convention extraction layer with two components: **scalar attribute resolvers** and **content payload parsers**.
### Scalar attribute resolvers (`resolvers.ts`)
Each promoted column is resolved from a priority-ordered list of convention-specific candidates. The first candidate that returns a value wins. Value translation is applied where conventions use different vocabularies.
| Column | GenAI current | GenAI deprecated / OpenLLMetry | OpenInference | Vercel AI SDK |
|---|---|---|---|---|
| `operation` | `gen_ai.operation.name` | `llm.request.type` (maps `completion`→`text_completion`, `embedding`→`embeddings`, etc.) | `openinference.span.kind` (maps `LLM`→`chat`, `EMBEDDING`→`embeddings`, `TOOL`→`execute_tool`, etc.) | `ai.operationId` (maps `ai.generateText`→`chat`, `ai.toolCall`→`execute_tool`, etc.) |
| `provider` | `gen_ai.provider.name` | `gen_ai.system` (aliases `bedrock`→`aws.bedrock`, `gemini`→`gcp.gemini`, `mistral`→`mistral_ai`, etc.) | `llm.system` (aliases `mistralai`→`mistral_ai`, `xai`→`x_ai`, `vertexai`→`gcp.vertex_ai`) | `ai.model.provider` (strips `.chat`/`.messages`/`.responses` suffixes, aliases `google.generative-ai`→`gcp.gemini`, `amazon-bedrock`→`aws.bedrock`) |
| `model` | `gen_ai.request.model` | same | `llm.model_name`, `embedding.model_name`, `reranker.model_name` | `ai.model.id` |
| `response_model` | `gen_ai.response.model` | same | `llm.model_name` (no request/response distinction) | `ai.response.model` |
| `tokens_input` | `gen_ai.usage.input_tokens` | `gen_ai.usage.prompt_tokens` | `llm.token_count.prompt` | `ai.usage.promptTokens` |
| `tokens_output` | `gen_ai.usage.output_tokens` | `gen_ai.usage.completion_tokens` | `llm.token_count.completion` | `ai.usage.completionTokens` |
| `tokens_cache_read` | `gen_ai.usage.cache_read.input_tokens` | same | `llm.token_count.prompt_details.cache_read` | — |
| `tokens_cache_create` | `gen_ai.usage.cache_creation.input_tokens` | same | `llm.token_count.prompt_details.cache_write` | — |
| `tokens_reasoning` | `gen_ai.usage.reasoning_tokens` | same | `llm.token_count.completion_details.reasoning` | — |
| `response_id` | `gen_ai.response.id` | same | — | `ai.response.id` |
| `finish_reasons` | `gen_ai.response.finish_reasons` (string[]) | same | — | `ai.response.finishReason` (singular string, wrapped to array; `tool-calls`→`tool_calls`, `content-filter`→`content_filter`) |
| `session_id` | `gen_ai.conversation.id` | same | `session.id` | — |
| `cost_*_microcents` | — | `gen_ai.usage.cost` (total only, USD float→microcents) | `llm.cost.prompt`, `llm.cost.completion`, `llm.cost.total` (USD float→microcents) | — |
OpenAI Agents SDK spans are handled implicitly — when bridged to OTEL via the official instrumentor, they emit GenAI convention attributes.
### Content payload parsers (`content/`)
LLM message payloads use fundamentally different storage structures across conventions, so each gets a dedicated parser with sentinel-based detection:
- **GenAI current** (sentinel: `gen_ai.input.messages` or `gen_ai.output.messages`): Parses structured/JSON messages already in GenAI parts-based format. Extracts `gen_ai.system_instructions` and `gen_ai.tool.definitions` as dedicated attributes.
- **GenAI deprecated / OpenLLMetry** (sentinel: `gen_ai.prompt` or `gen_ai.completion`): Parses flat JSON strings containing `{role, content}` message arrays. Translates to GenAI format via `rosetta-ai` auto-detection. Extracts `llm.request.functions` for tool definitions.
- **OpenInference** (sentinel: `llm.input_messages.*` prefix or `openinference.span.kind`): Reassembles flattened indexed span attributes (`llm.input_messages.{i}.message.role`, `.content`, `.tool_calls.{j}.tool_call.function.name`, etc.) by scanning, grouping by index, and sorting. Reconstructs `llm.tools.{i}.tool.json_schema` for tool definitions. Translates reassembled messages via `rosetta-ai`.
- **Vercel AI SDK** (sentinel: `ai.prompt` or `ai.prompt.messages`): Handles both top-level spans (`ai.prompt` JSON with `system` + `messages` fields) and call-level spans (`ai.prompt.messages` JSON array). Reconstructs output from split `ai.response.text` + `ai.response.toolCalls`. Parses `ai.prompt.tools` string array for tool definitions. Translates via `rosetta-ai` with explicit `Provider.VercelAI`.
All raw span attributes remain in the dynamic `attr_*` maps regardless of whether they were also extracted to promoted columns.1 parent a3b33ce commit 6ca835b
File tree
10 files changed
+896
-79
lines changed- apps/ingest/src/routes
- packages/domain/spans/src
- otlp
- content
10 files changed
+896
-79
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
35 | 41 | | |
36 | 42 | | |
37 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
| 9 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
Lines changed: 59 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
0 commit comments