feat: Expose Token Usage and Performance Metrics via LiteLLM

## What's the problem this feature will solve?

Currently, `mesa-llm` leverages `LiteLLM` to provide a unified interface for various LLM providers (Ollama, OpenAI, Gemini, etc.). While `LiteLLM` successfully returns token usage data (prompt, completion, and total tokens) within its response object, these metrics are not currently surfaced or tracked within the mesa-llm agent or simulation state.

I’m proposing that we add a system to capture and store these metrics. This would be particularly valuable for:

- Local Benchmarking: Tracking Tokens Per Second (TPS) for local models (e.g., Ollama/Llama 3) to optimise VRAM and quantisation settings.

- Cost Estimation: Simulating the "shadow cost" of a multi-agent simulation before deploying to a paid provider. Also getting a real-time feedback on the current usage in both local and API providers.

- Observability: Understanding the context-window usage of different LLMAgent personas over long-running simulations.

## Describe the solution you'd like

Since LiteLLM already provides a standardised usage object via `completion` method, we could implement a lightweight tracking system.

**1. Update the Agent State:**

Add a `usage_history` or `metrics` attribute to the `LLMAgent`\\`Reasoning`\\`ModuleLLM` or a new module for this purpose to store a running total of tokens consumed.

``` python
# Conceptual
self.metrics = {
    "total_prompt_tokens": 0,
    "total_completion_tokens": 0,
    "total_calls": 0,
    "last_latency": 0.0,
    ...
}
```

**2. Capture Data from the LiteLLM Response:**

Update the internal `completion` call to extract these values. The usage and metrics can be stored for Agent-wise and for the entire simulation. We have `recording` module, where this can be implemented to track the usage continuously.

``` python
response = litellm.completion(...)
...
...
# Update metrics
self.metrics["total_prompt_tokens"] += response.usage.prompt_tokens
self.metrics["total_completion_tokens"] += response.usage.completion_tokens
...
```

**3. Simulation-Level Aggregation:**

Allow the Mesa Model/Simulation to aggregate metrics from all agents to provide a "Total Simulation Cost/Usage" report at the end of a run as a report. We can also use `LiteLLM`'s built-in [reporting tool](https://docs.litellm.ai/docs/proxy/endpoint_activity) to provide an analytical view.

I’ve been looking into this while testing local LLM setups and noticed that having these metrics would make it much easier to debug agent verbosity and system performance. This also can act as an evaluation tool, for the prompt optimisations and other model/architecture level optimisations.

## Are you planning to open a PR for this?

Yes, I am currently exploring the implementation details for this and intend to include it as a core part of my **GSoC 2026 project proposal**. I'd love to get some feedback and opinions to ensure it aligns with the project's goals. If the idea/feature is okay, I can provide a detailed HLD and LLD for the feature.

## Additional context

LLM integration comes with huge challenges - cost, reliability, and performance. I have also started this discussion #174 on hallucinations and guardrails, which I'll also be adding to my proposal. Regardless of the GSOC, I would still love to work on these features as the project aligns with my current research and interests. More about that here, [mesa/mesa#2465](https://github.com/mesa/mesa/discussions/2465#discussioncomment-15938927). I'm open to discuss and finetune my ideas, as they are still in the ealry stages, based on the suggestions and feedback from the maintainers.

@wang-boyu, @jackiekazil , @colinfrisch 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Expose Token Usage and Performance Metrics via LiteLLM #178

What's the problem this feature will solve?

Describe the solution you'd like

Are you planning to open a PR for this?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat: Expose Token Usage and Performance Metrics via LiteLLM #178

Description

What's the problem this feature will solve?

Describe the solution you'd like

Are you planning to open a PR for this?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions