fix: apply_plan() flattens multiple tool call results#155
fix: apply_plan() flattens multiple tool call results#155khushiiagrawal wants to merge 4 commits intomesa:mainfrom
Conversation
Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
for more information, see https://pre-commit.ci
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #155 +/- ##
==========================================
+ Coverage 90.08% 90.66% +0.58%
==========================================
Files 19 19
Lines 1503 1554 +51
==========================================
+ Hits 1354 1409 +55
+ Misses 149 145 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
|
I like this PR and what it solves. I'm going to let copilot review to make sure I didn't miss stuff, but looks quite good to me :) |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug (#142) where apply_plan() and aapply_plan() in LLMAgent used a flat dict comprehension to merge multiple tool call results, causing all but the last result to be silently lost (since every tool result has the same name/response keys). The fix wraps the tool call results in a "tool_calls" list to preserve all results.
Changes:
- Replaced the flat dict comprehension in both
apply_plan()andaapply_plan()with a"tool_calls"list structure that preserves all tool call results. - Added list handling to
MemoryEntry.__str__()for rendering list values in the display output. - Added comprehensive tests for both sync and async variants with multiple tool calls, plus memory display tests for list values.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
mesa_llm/llm_agent.py |
Core fix: changed tool call result storage from flat dict to "tool_calls" list in both sync and async apply_plan methods. |
mesa_llm/memory/memory.py |
Added list handling to MemoryEntry.__str__() for rendering list values in rich display output. |
tests/test_llm_agent.py |
Updated existing test assertion for new structure; added sync and async tests verifying multiple tool calls are preserved. |
tests/test_memory/test_memory_parent.py |
Added tests for MemoryEntry.__str__() with list-of-dicts and list-of-strings content. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| elif isinstance(value, list): | ||
| for i, item in enumerate(value): | ||
| if isinstance(item, dict): | ||
| lines.append(f" [blue]├──[/blue] [cyan]({i + 1})[/cyan]") | ||
| lines.extend(format_nested_dict(item, 2)) | ||
| else: | ||
| lines.append(f" [blue]├──[/blue] [cyan]{item}[/cyan]") |
There was a problem hiding this comment.
The list handling added here only applies when a list is a direct value at the top level of MemoryEntry.content. However, in the actual data path from apply_plan(), the memory entry content will be {"action": {"tool_calls": [...]}} — the list is nested inside the "tool_calls" key of a dict. When format_nested_dict encounters this list, it falls through to the else branch (line 39-41) and renders it as a raw Python list string, not using the new tree formatting.
To properly handle this, format_nested_dict should also handle list values, similar to the handling added at lines 53-59. Otherwise the nice tree display only works for the synthetic test case, not the real data structure produced by the fix in llm_agent.py.
| str_repr = str(entry) | ||
| assert "move_one_step" in str_repr | ||
| assert "arrest_citizen" in str_repr | ||
|
|
There was a problem hiding this comment.
This test uses content = {"action": [list of dicts]} where the list is a direct value of the "action" key. However, the actual memory entry produced by apply_plan() will have the structure {"action": {"tool_calls": [list of dicts]}} — the list is nested one level deeper. This means the test exercises the new elif isinstance(value, list) branch in __str__() but does not cover the actual data path, where the list is inside format_nested_dict and would fall through to the else (plain value) branch. Consider adding a test with the real structure {"action": {"tool_calls": [...]}} to verify the display works end-to-end.
| def test_memory_entry_str_with_nested_tool_calls_list(self): | |
| """Test MemoryEntry string representation with nested tool_calls list under action.""" | |
| mock_agent = Mock() | |
| content = { | |
| "action": { | |
| "tool_calls": [ | |
| {"name": "move_one_step", "response": "moved"}, | |
| {"name": "arrest_citizen", "response": "arrested"}, | |
| ] | |
| } | |
| } | |
| entry = MemoryEntry(content=content, step=1, agent=mock_agent) | |
| str_repr = str(entry) | |
| assert "move_one_step" in str_repr | |
| assert "arrest_citizen" in str_repr |
Signed-off-by: khushiiagrawal <khushisaritaagrawal@gmail.com>
|
@colinfrisch thanks for the feedback. i have addressed all the copilot reviews. |
|
@colinfrisch @wang-boyu could you please take a look. thanks. |
Summary
apply_plan()andaapply_plan()flatten multiple tool call results into a single dict, causing all but the last result to be silently lost.Bug / Issue
Fixes #142
Each tool result has the same keys (
name,response), so the flat dict comprehension overwrites earlier entries. If the LLM calls bothmove_one_stepandarrest_citizen, only the arrest is stored in memory.Implementation
mesa_llm/llm_agent.py— Replaced the flat dict comprehension inapply_plan()andaapply_plan()with a"tool_calls"list that preserves all results.mesa_llm/memory/memory.py— Addedlisthandling toMemoryEntry.__str__()so the new structure renders correctly.tests/test_llm_agent.py— Updated existing test assertion; addedtest_apply_plan_preserves_multiple_tool_calls(sync) andtest_aapply_plan_preserves_multiple_tool_calls(async).Testing