Agent: Token usage metrics #2704

manstis · 2025-07-10T09:51:05Z

manstis
Jul 10, 2025

Hi,

Keeping track of LLM Token usage is important.

The InferenceApi's InferenceProvider.completion and InferenceProvider.chat_completion both support MetricResponseMixin.

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L192

llama_stack/distribution/routers/inference.py also seems to support creating Token metrics, returning them in response chunks and exporting them to configured telemetry.

All good so far.

However if we use an Agent to support use of Tools (RAG, MCP etc) the Token metrics from the Agent's use of the InferenceApi are dropped, lost, not propagated out in the response chunks.

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/agent_instance.py#L511

Is it reasonable to enhance llama-stack to return Token metrics when using an Agent?

Is there anything on a road-map to support it at a future date?

Perhaps I'm mistaken; so feel free to correct my understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent: Token usage metrics #2704

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Agent: Token usage metrics #2704

Uh oh!

manstis Jul 10, 2025

Replies: 0 comments

manstis
Jul 10, 2025