More detailed token usage span attributes and metrics

### Area(s)

area:gen-ai

### What's missing?

Models tend to have more than just 'input' and 'output' token types. The exact type matters because they have different prices.

In [Anthropic](https://docs.anthropic.com/en/api/messages#response-usage-cache-creation-input-tokens), there's `cache_creation_input_tokens` and `cache_read_input_tokens`.

[Google](https://ai.google.dev/api/generate-content#UsageMetadata) has this:

```
{
  "promptTokenCount": integer,
  "cachedContentTokenCount": integer,
  "candidatesTokenCount": integer,
  "toolUsePromptTokenCount": integer,
  "totalTokenCount": integer,
  "promptTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "cacheTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "candidatesTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "toolUsePromptTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ]
}
```

where `ModalityTokenCount` is

```
{
  "modality": enum (Modality),
  "tokenCount": integer
}
```

and `Modality` is `TEXT`, `IMAGE`, etc. Some of the token counts in the main object are just the standard input/output, and some are totals of others.

### Describe the solution you'd like

There should be optional attributes in both spans and metrics with these details. Consistency between AI providers is probably not very important, but conventions for each provider are important for backends to do things like calculate costs.

Possible span attribute names for e.g. Anthropic's `cache_creation_input_tokens`:

```
gen_ai.usage.cache_creation_input_tokens
gen_ai.usage.detailed.cache_creation_input_tokens
gen_ai.usage.anthropic.cache_creation_input_tokens
```

Google's additional dimensions make it more complicated:

```
gen_ai.usage.cachedContentTokenCount
gen_ai.usage.cached_content_token_count
gen_ai.usage.cached_content_token_count.text
gen_ai.usage.text.cached_content_token_count
gen_ai.usage.text
gen_ai.usage.modality.text
```

(I'm assuming that the `chat` operation can still involve things like image tokens and thus modality matters)

For metrics, the `gen_ai.token.type` attribute might be enough to describe the difference between 'normal' tokens and other types like 'cached'. But a separate metric name for other types (not one metric name per type) would help prevent people from adding up token types that mustn't be added, since e.g. cached tokens are a subset of input tokens. For modalities, I suggest an attribute like `gen_ai.token.modality`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More detailed token usage span attributes and metrics #1959

Area(s)

What's missing?

Describe the solution you'd like

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More detailed token usage span attributes and metrics #1959

Description

Area(s)

What's missing?

Describe the solution you'd like

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions