Skip to content

More detailed token usage span attributes and metricsΒ #1959

@alexmojaki

Description

@alexmojaki

Area(s)

area:gen-ai

What's missing?

Models tend to have more than just 'input' and 'output' token types. The exact type matters because they have different prices.

In Anthropic, there's cache_creation_input_tokens and cache_read_input_tokens.

Google has this:

{
  "promptTokenCount": integer,
  "cachedContentTokenCount": integer,
  "candidatesTokenCount": integer,
  "toolUsePromptTokenCount": integer,
  "totalTokenCount": integer,
  "promptTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "cacheTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "candidatesTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ],
  "toolUsePromptTokensDetails": [
    {
      object (ModalityTokenCount)
    }
  ]
}

where ModalityTokenCount is

{
  "modality": enum (Modality),
  "tokenCount": integer
}

and Modality is TEXT, IMAGE, etc. Some of the token counts in the main object are just the standard input/output, and some are totals of others.

Describe the solution you'd like

There should be optional attributes in both spans and metrics with these details. Consistency between AI providers is probably not very important, but conventions for each provider are important for backends to do things like calculate costs.

Possible span attribute names for e.g. Anthropic's cache_creation_input_tokens:

gen_ai.usage.cache_creation_input_tokens
gen_ai.usage.detailed.cache_creation_input_tokens
gen_ai.usage.anthropic.cache_creation_input_tokens

Google's additional dimensions make it more complicated:

gen_ai.usage.cachedContentTokenCount
gen_ai.usage.cached_content_token_count
gen_ai.usage.cached_content_token_count.text
gen_ai.usage.text.cached_content_token_count
gen_ai.usage.text
gen_ai.usage.modality.text

(I'm assuming that the chat operation can still involve things like image tokens and thus modality matters)

For metrics, the gen_ai.token.type attribute might be enough to describe the difference between 'normal' tokens and other types like 'cached'. But a separate metric name for other types (not one metric name per type) would help prevent people from adding up token types that mustn't be added, since e.g. cached tokens are a subset of input tokens. For modalities, I suggest an attribute like gen_ai.token.modality.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions