Token counting in AgentWorkflows #18493

martimfasantos · 2025-04-21T11:33:48Z

martimfasantos
Apr 21, 2025

Hi,
What is the best way of implementing a TokenCounter for an AgentWorkflow. My idea was to add a token counter to each agent and try to sum the counts in the end but that doesn't seem possible when I'm not the one calling the agent - the Workflow is.
It is also not possible to add TokenCountingHandler to the AgentWorkflow.
If anyone implemented this already, feel free to share your solution :)

@dosu · 2025-04-21T11:35:49Z

dosubot[bot]
bot Apr 21, 2025

To implement a TokenCounter for an AgentWorkflow, you can utilize the existing utilities in the LlamaIndex codebase. The llama_index.core.callbacks.token_counting module provides a TokenCountingHandler that can be used to count tokens in LLM and Embedding events. This handler can be integrated into your workflow to track token usage across different agents.

Here's a general approach you can take:

Initialize a TokenCountingHandler: Create an instance of the TokenCountingHandler at the start of your workflow. This handler will be responsible for counting tokens throughout the execution of the workflow.
Attach the Handler to the Workflow: Ensure that the TokenCountingHandler is attached to the AgentWorkflow so that it can listen to events and count tokens as agents perform their tasks.
Aggregate Token Counts: At the end of the workflow, you can retrieve the total token counts from the TokenCountingHandler. This will give you the cumulative token usage across all agents in the workflow.

By using the TokenCountingHandler, you can effectively track and aggregate token usage without needing to manually sum counts from individual agents. This approach leverages the existing infrastructure in LlamaIndex for token counting, ensuring synchronization and accuracy.

For more details on the TokenCountingHandler, you can refer to the llama_index.core.callbacks.token_counting module ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

3 replies

martimfasantos Apr 22, 2025
Author

But is it possible to add the token counter directly on the AgentWorkflow? It doesn't seem possible

1-SarthakSarangi Jun 26, 2025

@dosu is it possible to add the token counter directly on the AgentWorkflow? It doesn't seem possible

dosubot[bot] bot Jun 26, 2025

You're correct—it's not currently possible to attach a TokenCountingHandler (or similar token counter) directly to an AgentWorkflow. The design expects handlers like TokenCountingHandler to be integrated at a higher level, such as the global CallbackManager, rather than on the workflow itself. This is a known limitation and is reflected in both the code and community discussions—there's no documented or code-supported way to inject handlers directly into AgentWorkflow at this time.

If you need token counting for workflows, the recommended approach is to attach your TokenCountingHandler to the global CallbackManager (e.g., via Settings.callback_manager) so it observes all relevant events, or to implement a custom solution that parses token counts from LLM responses as they flow through the workflow. Some users have shared custom solutions for this in the community, since the built-in handler is aging and may not cover all use cases or new abstractions source.

Let me know if you want pointers to example custom solutions or more details on integrating with the CallbackManager.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

KeepALifeUS · 2026-02-12T20:46:23Z

KeepALifeUS
Feb 12, 2026

Adding to the workarounds — here's an architecture that solves this:

State-Based Token Tracking

Instead of trying to attach handlers to the workflow, track tokens in shared state:

state = {
    "token_usage": {
        "total_input": 0,
        "total_output": 0,
        "by_agent": {},
        "by_step": []
    }
}

def track_tokens(agent_name, input_tokens, output_tokens, state):
    state["token_usage"]["total_input"] += input_tokens
    state["token_usage"]["total_output"] += output_tokens
    
    if agent_name not in state["token_usage"]["by_agent"]:
        state["token_usage"]["by_agent"][agent_name] = {"input": 0, "output": 0}
    
    state["token_usage"]["by_agent"][agent_name]["input"] += input_tokens
    state["token_usage"]["by_agent"][agent_name]["output"] += output_tokens
    
    state["token_usage"]["by_step"].append({
        "agent": agent_name,
        "input": input_tokens,
        "output": output_tokens,
        "timestamp": time.time()
    })

Why This Works

Works at any level — Workflow, agent, or individual call
Aggregation is built-in — Just sum the state values
Per-agent breakdown — See which agent uses most tokens
Debuggable — Inspect state to understand token usage over time

Integration

Wrap your LLM calls to update state:

def tracked_llm_call(messages, agent_name, state):
    response = llm.complete(messages)
    track_tokens(
        agent_name,
        response.raw.usage.prompt_tokens,
        response.raw.usage.completion_tokens,
        state
    )
    return response

This pattern gives you full visibility without fighting the callback system.

0 replies

fjnunezp75 · 2026-03-15T00:24:22Z

fjnunezp75
Mar 15, 2026

The core issue here is that AgentWorkflow does not propagate a callback manager down to individual agents the way single-agent setups do. Here is a working pattern using the Settings callback approach:

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
import tiktoken

# Initialize handler at the Settings level — this is key
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-4o").encode
)
Settings.callback_manager = CallbackManager([token_counter])

# Create your workflow normally
workflow = AgentWorkflow.from_tools(
    tools=[...],
    llm=Settings.llm,  # LLM picks up the callback manager from Settings
)

# Run it
result = await workflow.run(user_msg="...")

# Read counts after
print(f"LLM prompt tokens: {token_counter.total_llm_token_count}")
print(f"Embedding tokens: {token_counter.total_embedding_token_count}")

If you need per-agent breakdown, the shared-state approach from @KeepALifeUS above is solid. One addition — you can capture the raw usage directly from the LLM response instead of estimating:

from llama_index.core.llms import LLM
from functools import wraps

def token_tracking_wrapper(llm, state, agent_name):
    original_chat = llm.chat
    
    @wraps(original_chat)
    def tracked_chat(messages, **kwargs):
        response = original_chat(messages, **kwargs)
        # Access raw usage from the response
        if hasattr(response, "raw") and hasattr(response.raw, "usage"):
            usage = response.raw.usage
            state["tokens"][agent_name] = state["tokens"].get(agent_name, {"in": 0, "out": 0})
            state["tokens"][agent_name]["in"] += usage.prompt_tokens
            state["tokens"][agent_name]["out"] += usage.completion_tokens
        return response
    
    llm.chat = tracked_chat
    return llm

On cost estimation: if you are tracking tokens, rough cost math is straightforward. For reference, LlamaIndex agents typically run 500–2000 tokens per step depending on context window usage. For workflows that call external AI services frequently, GPU-Bridge is worth a look — it exposes LLM, embeddings, and other AI services at fixed low rates ($0.001/LLM call, $0.02-0.04/image) via a single endpoint, which makes per-call cost calculation trivial.

What LLM provider are you using? Some emit usage differently (Anthropic vs OpenAI vs local), which affects where to hook the counter.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token counting in AgentWorkflows #18493

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Token counting in AgentWorkflows #18493

Uh oh!

Uh oh!

martimfasantos Apr 21, 2025

Replies: 3 comments · 3 replies

Uh oh!

dosubot[bot] bot Apr 21, 2025

Uh oh!

martimfasantos Apr 22, 2025 Author

Uh oh!

1-SarthakSarangi Jun 26, 2025

Uh oh!

dosubot[bot] bot Jun 26, 2025

Uh oh!

KeepALifeUS Feb 12, 2026

State-Based Token Tracking

Why This Works

Integration

Uh oh!

fjnunezp75 Mar 15, 2026

martimfasantos
Apr 21, 2025

Replies: 3 comments 3 replies

dosubot[bot]
bot Apr 21, 2025

martimfasantos Apr 22, 2025
Author

KeepALifeUS
Feb 12, 2026

fjnunezp75
Mar 15, 2026