-
-
Notifications
You must be signed in to change notification settings - Fork 64
feat : Structured llm reasoning added for auditable and constrained decision making #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
apfine
wants to merge
6
commits into
mesa:main
Choose a base branch
from
apfine:cot
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+673
−5
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
9406fe9
feat: add structured decision reasoning
apfine b2da475
fix : resolved issues suggested by reviewers
apfine 1ecf077
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4ad159a
fix : added per system call prompt , stopped mutation of llm wrapper
apfine bc5c9c7
Merge branch 'cot' of https://github.com/apfine/mesa-llm into cot
apfine e7cf023
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,192 @@ | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from pydantic import BaseModel, Field | ||
|
|
||
| from mesa_llm.reasoning.reasoning import Observation, Plan, Reasoning | ||
|
|
||
| if TYPE_CHECKING: | ||
| from mesa_llm.llm_agent import LLMAgent | ||
|
|
||
|
|
||
| class DecisionOption(BaseModel): | ||
| name: str | ||
| description: str | ||
| tradeoffs: list[str] | ||
| score: float = Field( | ||
| ge=0.0, | ||
| le=1.0, | ||
| description="Relative evaluation score for this option in the current context.", | ||
| ) | ||
|
|
||
|
|
||
| class DecisionOutput(BaseModel): | ||
| goal: str | ||
| constraints: list[str] | ||
| known_facts: list[str] | ||
| unknowns: list[str] | ||
| assumptions: list[str] | ||
| options: list[DecisionOption] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The logic relies on a specific message.parsed attribute. The reasoning layer should not depend on response internals of a particular LLM warpper. It's better to normalise this earlier in the LLM wrapper. |
||
| chosen_option: str | ||
| rationale: str | ||
| confidence: float = Field(ge=0.0, le=1.0) | ||
| risks: list[str] | ||
| next_action: str | ||
|
|
||
|
|
||
| class DecisionReasoning(Reasoning): | ||
| """ | ||
| Structured decision-making reasoning that returns a strict JSON object before | ||
| converting the selected next action into tool calls. | ||
| """ | ||
|
|
||
| def __init__(self, agent: "LLMAgent"): | ||
| super().__init__(agent=agent) | ||
|
|
||
| def get_decision_system_prompt(self) -> str: | ||
| return """ | ||
| You are an autonomous agent operating within a simulation environment. | ||
|
|
||
| Your task is to analyze your current observation and memory to make a highly structured, optimal decision. | ||
| Do not produce free-form chain-of-thought prose. You must evaluate the situation and return a strict JSON object matching the required schema. | ||
|
|
||
| Your response must include: | ||
| - goal: Your current primary objective within the simulation. | ||
| - constraints: Any rules, resource limits, or environmental boundaries restricting your actions. | ||
| - known_facts: Verified data strictly grounded in your current observation or historical memory. | ||
| - unknowns: Critical missing information required for perfect decision-making. | ||
| - assumptions: Logical inferences made to bridge the gap between known facts and unknowns. | ||
| - options: A list of distinct, executable choices currently available to you. Each must include a name, description, tradeoffs, and a relative evaluation score. | ||
| - chosen_option: The exact name of the best option selected from the list above. | ||
| - rationale: A concise, logical justification for why this option was chosen over the alternatives. | ||
| - confidence: A float between 0.0 and 1.0 representing your certainty in this decision. | ||
| - risks: Potential negative outcomes or failure states associated with the chosen option. | ||
| - next_action: A single, concrete, and strictly formatted executable command. | ||
|
|
||
| Execution Requirements: | ||
| 1. Ground all known_facts entirely in the provided observation context. Do not hallucinate simulation state or capabilities. | ||
| 2. next_action must strictly match an available execution command. Do not invent tools. | ||
| 3. If information is heavily constrained or missing, explicitly reflect this by lowering the confidence score and detailing the danger in risks. | ||
| """ | ||
|
|
||
| def get_decision_prompt(self, obs: Observation) -> list[str]: | ||
| prompt_list = [] | ||
|
|
||
| get_prompt_ready = getattr(self.agent.memory, "get_prompt_ready", None) | ||
| if callable(get_prompt_ready): | ||
| prompt_list.append(get_prompt_ready()) | ||
|
|
||
| get_communication_history = getattr( | ||
| self.agent.memory, "get_communication_history", None | ||
| ) | ||
| last_communication = ( | ||
| get_communication_history() if callable(get_communication_history) else "" | ||
| ) | ||
|
|
||
| if last_communication: | ||
| prompt_list.append("last communication: \n" + str(last_communication)) | ||
| if obs: | ||
| prompt_list.append("current observation: \n" + str(obs)) | ||
|
|
||
| return prompt_list | ||
|
|
||
| def plan( | ||
| self, | ||
| prompt: str | None = None, | ||
| obs: Observation | None = None, | ||
| ttl: int = 1, | ||
| selected_tools: list[str] | None = None, | ||
| ) -> Plan: | ||
| """ | ||
| Plan the next action through a structured decision artifact. | ||
| """ | ||
| if obs is None: | ||
| obs = self.agent.generate_obs() | ||
|
|
||
| prompt_list = self.get_decision_prompt(obs) | ||
|
|
||
| if prompt is not None: | ||
| prompt_list.append(prompt) | ||
| elif self.agent.step_prompt is not None: | ||
| prompt_list.append(self.agent.step_prompt) | ||
| else: | ||
| raise ValueError("No prompt provided and agent.step_prompt is None.") | ||
|
|
||
| selected_tools_schema = self.agent.tool_manager.get_all_tools_schema( | ||
| selected_tools | ||
| ) | ||
|
|
||
| rsp = self.agent.llm.generate( | ||
| prompt=prompt_list, | ||
| tool_schema=selected_tools_schema, | ||
| tool_choice="none", | ||
| response_format=DecisionOutput, | ||
| system_prompt=self.get_decision_system_prompt(), | ||
| ) | ||
|
|
||
| formatted_response = self.agent.llm.parse_structured_output( | ||
| rsp, DecisionOutput | ||
| ).model_dump() | ||
| self.agent.memory.add_to_memory(type="decision", content=formatted_response) | ||
|
|
||
| if hasattr(self.agent, "_step_display_data"): | ||
| self.agent._step_display_data["plan_content"] = formatted_response[ | ||
| "rationale" | ||
| ] | ||
|
|
||
| return self.execute_tool_call( | ||
| formatted_response["next_action"], | ||
| selected_tools=selected_tools, | ||
| ttl=ttl, | ||
| ) | ||
|
|
||
| async def aplan( | ||
| self, | ||
| prompt: str | None = None, | ||
| obs: Observation | None = None, | ||
| ttl: int = 1, | ||
| selected_tools: list[str] | None = None, | ||
| ) -> Plan: | ||
| """ | ||
| Asynchronous version of plan() method for parallel planning. | ||
| """ | ||
| if obs is None: | ||
| obs = await self.agent.agenerate_obs() | ||
|
|
||
| prompt_list = self.get_decision_prompt(obs) | ||
|
|
||
| if prompt is not None: | ||
| prompt_list.append(prompt) | ||
| elif self.agent.step_prompt is not None: | ||
| prompt_list.append(self.agent.step_prompt) | ||
| else: | ||
| raise ValueError("No prompt provided and agent.step_prompt is None.") | ||
|
|
||
| selected_tools_schema = self.agent.tool_manager.get_all_tools_schema( | ||
| selected_tools | ||
| ) | ||
|
|
||
| rsp = await self.agent.llm.agenerate( | ||
| prompt=prompt_list, | ||
| tool_schema=selected_tools_schema, | ||
| tool_choice="none", | ||
| response_format=DecisionOutput, | ||
| system_prompt=self.get_decision_system_prompt(), | ||
| ) | ||
|
|
||
| formatted_response = self.agent.llm.parse_structured_output( | ||
| rsp, DecisionOutput | ||
| ).model_dump() | ||
| await self.agent.memory.aadd_to_memory( | ||
| type="decision", content=formatted_response | ||
| ) | ||
|
|
||
| if hasattr(self.agent, "_step_display_data"): | ||
| self.agent._step_display_data["plan_content"] = formatted_response[ | ||
| "rationale" | ||
| ] | ||
|
|
||
| return await self.aexecute_tool_call( | ||
| formatted_response["next_action"], | ||
| selected_tools=selected_tools, | ||
| ttl=ttl, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a constraint here would be better in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure !!