-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Is there an existing issue for this?
- I have searched the existing issues
Kong version ($ kong version)
2.9.1
Current Behavior
Currently, the AI Proxy plugin ignores the explicit total_tokens value returned by the LLM provider when generating Prometheus metrics. Instead, it manually calculates the total by summing prompt and completion tokens. This results in inaccurate monitoring and underreporting of token usage.
Steps to Reproduce
- Configure a route with the AI Proxy plugin.
- Make a request to a model that uses reasoning tokens (where
total != prompt + completion). - Observe the upstream JSON response body:
(Note the difference: 3 + 7 = 10, but the actual billed total is 298).
"usage": { "prompt_tokens": 3, "total_tokens": 298, "completion_tokens": 7 }
- Check the Prometheus metric
ai_llm_total_tokens_count. It records10instead of298.
Root Cause Analysis
I have identified two places in the source code causing this behavior:
1. The data is not passed to the metrics plugin (kong/llm/drivers/shared.lua)
In the shared.lua driver, specifically around line 837, the code extracts prompt and completion tokens for metrics but omits total_tokens, even though it captures it for the analytics log container.
-- kong\llm\drivers\shared.lua - Line 863
if response_object.usage then
if response_object.usage.prompt_tokens then
request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.PROMPT_TOKENS] = response_object.usage.prompt_tokens
end
if response_object.usage.completion_tokens then
request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.COMPLETION_TOKENS] = response_object.usage.completion_tokens
end
if response_object.usage.total_tokens then
request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.TOTAL_TOKENS] = response_object.usage.total_tokens
end
ai_plugin_o11y.metrics_set("llm_prompt_tokens_count", response_object.usage.prompt_tokens)
ai_plugin_o11y.metrics_set("llm_completion_tokens_count", `response_object.usage.completion_tokens)`2. The total is hard-calculated (kong/llm/plugin/observability.lua)
Even if the driver passed the value, the observability plugin currently forces a manual summation, completely ignoring any explicit total provided by the driver.
-- kong/llm/plugin/observability.lua - Line 78
elseif key == "llm_total_tokens_count" then
return _M.metrics_get("llm_prompt_tokens_count") + _M.metrics_get("llm_completion_tokens_count")
endThe fix should be a simple add the metric (if exists on the response body)
ai_plugin_o11y.metrics_set("llm_total_tokens_count", `response_object.usage.total_tokens)`Expected Behavior
No response
Steps To Reproduce
No response
Anything else?
No response