Skip to content

[AI Proxy] Incorrect llm_total_tokens_count metric for models with reasoning/hidden tokens #14816

@Fvegini

Description

@Fvegini

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

2.9.1

Current Behavior

Currently, the AI Proxy plugin ignores the explicit total_tokens value returned by the LLM provider when generating Prometheus metrics. Instead, it manually calculates the total by summing prompt and completion tokens. This results in inaccurate monitoring and underreporting of token usage.

Steps to Reproduce

  1. Configure a route with the AI Proxy plugin.
  2. Make a request to a model that uses reasoning tokens (where total != prompt + completion).
  3. Observe the upstream JSON response body:
    "usage": {
        "prompt_tokens": 3,
        "total_tokens": 298,
        "completion_tokens": 7
    }
    (Note the difference: 3 + 7 = 10, but the actual billed total is 298).
  4. Check the Prometheus metric ai_llm_total_tokens_count. It records 10 instead of 298.

Root Cause Analysis

I have identified two places in the source code causing this behavior:

1. The data is not passed to the metrics plugin (kong/llm/drivers/shared.lua)
In the shared.lua driver, specifically around line 837, the code extracts prompt and completion tokens for metrics but omits total_tokens, even though it captures it for the analytics log container.

  -- kong\llm\drivers\shared.lua - Line 863
  if response_object.usage then
    if response_object.usage.prompt_tokens then
      request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.PROMPT_TOKENS] = response_object.usage.prompt_tokens
    end
    if response_object.usage.completion_tokens then
      request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.COMPLETION_TOKENS] = response_object.usage.completion_tokens
    end
    if response_object.usage.total_tokens then
      request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.TOTAL_TOKENS] = response_object.usage.total_tokens
    end

    ai_plugin_o11y.metrics_set("llm_prompt_tokens_count", response_object.usage.prompt_tokens)
    ai_plugin_o11y.metrics_set("llm_completion_tokens_count", `response_object.usage.completion_tokens)`

2. The total is hard-calculated (kong/llm/plugin/observability.lua)
Even if the driver passed the value, the observability plugin currently forces a manual summation, completely ignoring any explicit total provided by the driver.

-- kong/llm/plugin/observability.lua - Line 78
    elseif key == "llm_total_tokens_count" then
      return _M.metrics_get("llm_prompt_tokens_count") + _M.metrics_get("llm_completion_tokens_count")
    end

The fix should be a simple add the metric (if exists on the response body)

ai_plugin_o11y.metrics_set("llm_total_tokens_count", `response_object.usage.total_tokens)`

Expected Behavior

No response

Steps To Reproduce

No response

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions