-
-
Notifications
You must be signed in to change notification settings - Fork 52
Description
What's the problem this feature will solve?
Currently, mesa-llm leverages LiteLLM to provide a unified interface for various LLM providers (Ollama, OpenAI, Gemini, etc.). While LiteLLM successfully returns token usage data (prompt, completion, and total tokens) within its response object, these metrics are not currently surfaced or tracked within the mesa-llm agent or simulation state.
I’m proposing that we add a system to capture and store these metrics. This would be particularly valuable for:
-
Local Benchmarking: Tracking Tokens Per Second (TPS) for local models (e.g., Ollama/Llama 3) to optimise VRAM and quantisation settings.
-
Cost Estimation: Simulating the "shadow cost" of a multi-agent simulation before deploying to a paid provider. Also getting a real-time feedback on the current usage in both local and API providers.
-
Observability: Understanding the context-window usage of different LLMAgent personas over long-running simulations.
Describe the solution you'd like
Since LiteLLM already provides a standardised usage object via completion method, we could implement a lightweight tracking system.
1. Update the Agent State:
Add a usage_history or metrics attribute to the LLMAgent\Reasoning\ModuleLLM or a new module for this purpose to store a running total of tokens consumed.
# Conceptual
self.metrics = {
"total_prompt_tokens": 0,
"total_completion_tokens": 0,
"total_calls": 0,
"last_latency": 0.0,
...
}2. Capture Data from the LiteLLM Response:
Update the internal completion call to extract these values. The usage and metrics can be stored for Agent-wise and for the entire simulation. We have recording module, where this can be implemented to track the usage continuously.
response = litellm.completion(...)
...
...
# Update metrics
self.metrics["total_prompt_tokens"] += response.usage.prompt_tokens
self.metrics["total_completion_tokens"] += response.usage.completion_tokens
...3. Simulation-Level Aggregation:
Allow the Mesa Model/Simulation to aggregate metrics from all agents to provide a "Total Simulation Cost/Usage" report at the end of a run as a report. We can also use LiteLLM's built-in reporting tool to provide an analytical view.
I’ve been looking into this while testing local LLM setups and noticed that having these metrics would make it much easier to debug agent verbosity and system performance. This also can act as an evaluation tool, for the prompt optimisations and other model/architecture level optimisations.
Are you planning to open a PR for this?
Yes, I am currently exploring the implementation details for this and intend to include it as a core part of my GSoC 2026 project proposal. I'd love to get some feedback and opinions to ensure it aligns with the project's goals. If the idea/feature is okay, I can provide a detailed HLD and LLD for the feature.
Additional context
LLM integration comes with huge challenges - cost, reliability, and performance. I have also started this discussion #174 on hallucinations and guardrails, which I'll also be adding to my proposal. Regardless of the GSOC, I would still love to work on these features as the project aligns with my current research and interests. More about that here, mesa/mesa#2465. I'm open to discuss and finetune my ideas, as they are still in the ealry stages, based on the suggestions and feedback from the maintainers.