Token counting in AgentWorkflows #18493
Replies: 3 comments 3 replies
-
|
To implement a Here's a general approach you can take:
By using the For more details on the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
-
|
Adding to the workarounds — here's an architecture that solves this: State-Based Token TrackingInstead of trying to attach handlers to the workflow, track tokens in shared state: state = {
"token_usage": {
"total_input": 0,
"total_output": 0,
"by_agent": {},
"by_step": []
}
}
def track_tokens(agent_name, input_tokens, output_tokens, state):
state["token_usage"]["total_input"] += input_tokens
state["token_usage"]["total_output"] += output_tokens
if agent_name not in state["token_usage"]["by_agent"]:
state["token_usage"]["by_agent"][agent_name] = {"input": 0, "output": 0}
state["token_usage"]["by_agent"][agent_name]["input"] += input_tokens
state["token_usage"]["by_agent"][agent_name]["output"] += output_tokens
state["token_usage"]["by_step"].append({
"agent": agent_name,
"input": input_tokens,
"output": output_tokens,
"timestamp": time.time()
})Why This Works
IntegrationWrap your LLM calls to update state: def tracked_llm_call(messages, agent_name, state):
response = llm.complete(messages)
track_tokens(
agent_name,
response.raw.usage.prompt_tokens,
response.raw.usage.completion_tokens,
state
)
return responseThis pattern gives you full visibility without fighting the callback system. |
Beta Was this translation helpful? Give feedback.
-
|
The core issue here is that from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
import tiktoken
# Initialize handler at the Settings level — this is key
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-4o").encode
)
Settings.callback_manager = CallbackManager([token_counter])
# Create your workflow normally
workflow = AgentWorkflow.from_tools(
tools=[...],
llm=Settings.llm, # LLM picks up the callback manager from Settings
)
# Run it
result = await workflow.run(user_msg="...")
# Read counts after
print(f"LLM prompt tokens: {token_counter.total_llm_token_count}")
print(f"Embedding tokens: {token_counter.total_embedding_token_count}")If you need per-agent breakdown, the shared-state approach from @KeepALifeUS above is solid. One addition — you can capture the raw usage directly from the LLM response instead of estimating: from llama_index.core.llms import LLM
from functools import wraps
def token_tracking_wrapper(llm, state, agent_name):
original_chat = llm.chat
@wraps(original_chat)
def tracked_chat(messages, **kwargs):
response = original_chat(messages, **kwargs)
# Access raw usage from the response
if hasattr(response, "raw") and hasattr(response.raw, "usage"):
usage = response.raw.usage
state["tokens"][agent_name] = state["tokens"].get(agent_name, {"in": 0, "out": 0})
state["tokens"][agent_name]["in"] += usage.prompt_tokens
state["tokens"][agent_name]["out"] += usage.completion_tokens
return response
llm.chat = tracked_chat
return llmOn cost estimation: if you are tracking tokens, rough cost math is straightforward. For reference, LlamaIndex agents typically run 500–2000 tokens per step depending on context window usage. For workflows that call external AI services frequently, GPU-Bridge is worth a look — it exposes LLM, embeddings, and other AI services at fixed low rates ($0.001/LLM call, $0.02-0.04/image) via a single endpoint, which makes per-call cost calculation trivial. What LLM provider are you using? Some emit usage differently (Anthropic vs OpenAI vs local), which affects where to hook the counter. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
What is the best way of implementing a TokenCounter for an
AgentWorkflow. My idea was to add a token counter to each agent and try to sum the counts in the end but that doesn't seem possible when I'm not the one calling the agent - the Workflow is.It is also not possible to add
TokenCountingHandlerto theAgentWorkflow.If anyone implemented this already, feel free to share your solution :)
Beta Was this translation helpful? Give feedback.
All reactions