bug: apply_plan() flattens multiple tool call results into one dict, silently losing all but the last

**Describe the bug**

When the LLM returns multiple tool calls in a single plan, `apply_plan()` (and `aapply_plan()`) uses a double-loop dict comprehension to merge all tool call results into a single flat dict before passing it to `add_to_memory()`. Since every tool result has the same keys (`name` and `response`), later entries overwrite earlier ones. Only the last tool call's result is stored in memory, the rest are permanently lost.

This is different from #137 which is about `step_content` overwriting same-type entries. This bug destroys the data *before* it even reaches `add_to_memory()`.

**File:** `mesa_llm/llm_agent.py`, lines 117–125 (sync) and 92–100 (async)

```python
self.memory.add_to_memory(
    type="action",
    content={
        k: v
        for tool_call in tool_call_resp
        for k, v in tool_call.items()
        if k not in ["tool_call_id", "role"]
    },
)
```

Each tool result from `ToolManager._process_tool_call()` returns `{"tool_call_id": ..., "role": ..., "name": ..., "response": ...}`. After filtering, every result has exactly `name` and `response`,  so the dict comprehension just keeps overwriting.

**Expected behavior**

All executed tool call results should be preserved in the agent's memory. If the LLM decides to both move and arrest in one step, the agent should remember both actions, not just the arrest.

**To Reproduce**

```python
from unittest.mock import MagicMock, patch
from mesa.model import Model
from mesa.space import MultiGrid
from mesa_llm.llm_agent import LLMAgent
from mesa_llm.memory.st_memory import ShortTermMemory
from mesa_llm.reasoning.react import ReActReasoning
from mesa_llm.reasoning.reasoning import Plan

import os
os.environ["GEMINI_API_KEY"] = "dummy"

model = Model(seed=42)
model.grid = MultiGrid(5, 5, torus=False)
agent = LLMAgent(model=model, reasoning=ReActReasoning, vision=-1)
agent.memory = ShortTermMemory(agent=agent, n=5, display=False)

# Simulate LLM returning 2 tool calls
fake_response = [
    {"tool_call_id": "1", "role": "tool", "name": "move_one_step",   "response": "agent moved to (3, 4)"},
    {"tool_call_id": "2", "role": "tool", "name": "arrest_citizen",  "response": "Citizen 12 arrested"},
]

agent.tool_manager.call_tools = lambda agent, llm_response: fake_response
plan = Plan(step=0, llm_plan="do something")
agent.apply_plan(plan)

print(agent.memory.step_content)
# {'action': {'name': 'arrest_citizen', 'response': 'Citizen 12 arrested'}}
# move_one_step is gone
```

**Additional context**

This affects all three reasoning strategies (CoT, ReAct, ReWOO) and all existing example models — Epstein Civil Violence (`["move_one_step", "arrest_citizen"]`), Negotiation (`["teleport_to_location", "speak_to", "buy_product"]`), Sugarscape (`["move_to_best_resource", "propose_trade"]`). Any time the LLM decides to call more than one tool, the agent's memory is incomplete.

The existing test `test_apply_plan_adds_to_memory` only uses a single-item fake response, so it never triggers this.

A possible fix would be to store tool results as a list:
```python
self.memory.add_to_memory(
    type="action",
    content={
        "tool_calls": [
            {k: v for k, v in tc.items() if k not in ["tool_call_id", "role"]}
            for tc in tool_call_resp
        ]
    },
)
```
Both `apply_plan()` and `aapply_plan()` need the same fix, plus `MemoryEntry.__str__()` would need to handle the new structure for display.

@wang-boyu  @sanika-n  would appreciate your views on this . thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: apply_plan() flattens multiple tool call results into one dict, silently losing all but the last #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: apply_plan() flattens multiple tool call results into one dict, silently losing all but the last #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions