Skip to content

feat : Structured llm reasoning added for auditable and constrained decision making#188

Open
apfine wants to merge 3 commits intomesa:mainfrom
apfine:cot
Open

feat : Structured llm reasoning added for auditable and constrained decision making#188
apfine wants to merge 3 commits intomesa:mainfrom
apfine:cot

Conversation

@apfine
Copy link

@apfine apfine commented Mar 11, 2026

Summary

This PR adds a new structured reasoning strategy, DecisionReasoning, on the cot branch.

The change does not modify the existing CoTReasoning implementation. Instead, it introduces a separate reasoning module for cases where the model should make decisions through a strict, machine-readable schema rather than free-form chain-of-thought text.

The new reasoning path:

  • gathers memory and observation context
  • asks the LLM for a structured decision object
  • validates that object with Pydantic
  • stores the structured decision in memory
  • executes only the selected next_action

What This PR Adds

New reasoning module

Added:

  • mesa_llm/reasoning/decision.py

This module defines:

  • DecisionOption
  • DecisionOutput
  • DecisionReasoning

New reasoning schema

The model is required to return a strict JSON object containing:

  • goal
  • constraints
  • known_facts
  • unknowns
  • assumptions
  • options
  • chosen_option
  • rationale
  • confidence
  • risks
  • next_action

Each option also contains:

  • name
  • description
  • tradeoffs
  • score

Validation

The structured response is validated through Pydantic before execution.

This gives:

  • a stable output shape
  • explicit required fields
  • bounded confidence in the range 0.0 to 1.0
  • safer downstream use of model output

Execution flow

The existing tool execution pipeline is preserved.

The flow is now:

  1. Build prompt context from memory and current observation
  2. Generate a structured decision object
  3. Store the decision as memory
  4. Extract next_action
  5. Execute next_action through the existing tool executor

This means the executor no longer depends on a long reasoning blob for this reasoning mode.

Memory integration

The decision artifact is stored as:

  • type="decision"

This makes the reasoning result easier to inspect and supports future analysis of:

  • confidence
  • risks
  • assumptions
  • option selection quality

Tests

Added:

  • tests/test_reasoning/test_decision.py

The test coverage includes:

  • schema model creation
  • prompt generation
  • sync planning
  • async planning
  • selected tool propagation
  • prompt fallback behavior
  • no-prompt error handling
  • execution of next_action
  • display metadata behavior

What This PR Does Not Change

This PR does not modify:

  • mesa_llm/reasoning/cot.py
  • CoTReasoning
  • ReActReasoning
  • ReWOOReasoning

So while this branch is named cot, the latest change is not an update to the existing CoT implementation. It is a new alternative reasoning mode focused on structured decision-making.

Why This Change

The current reasoning styles in the repo support:

  • free-form chain-of-thought
  • lightweight reasoning + action
  • multi-step planning

What was missing was a reasoning mode that explicitly separates:

  • facts
  • unknowns
  • assumptions
  • options
  • tradeoffs
  • confidence
  • risks
  • final executable action

This is useful when decision quality and inspectability matter more than verbose reasoning prose.

Example Output

{
  "goal": "Reach food",
  "constraints": ["One move this turn"],
  "known_facts": ["Food is visible to the east"],
  "unknowns": ["Whether another agent will block the path"],
  "assumptions": ["The east cell remains traversable this step"],
  "options": [
    {
      "name": "move_east",
      "description": "Move toward visible food",
      "tradeoffs": ["Fast progress", "Potential contention"],
      "score": 0.88
    }
  ],
  "chosen_option": "move_east",
  "rationale": "It best advances the goal with acceptable risk.",
  "confidence": 0.78,
  "risks": ["Another agent may reach the food first"],
  "next_action": "move_east"
}

Only next_action is forwarded to the executor.

Validation Performed

The following checks were run:

pytest tests/test_reasoning/test_decision.py tests/test_reasoning -q
ruff check mesa_llm tests

Results:

reasoning tests passed

Ruff : passed

only an existing upstream Mesa deprecation warning was observed during pytest

Files Added :
mesa_llm/reasoning/decision.py tests/test_reasoning/test_decision.py

Notes For Reviewers

This PR should be reviewed as:

  • a new reasoning capability

  • a structured alternative to free-form CoT

  • a non-breaking addition to the current reasoning system

It should not be interpreted as an enhancement to the existing CoTReasoning class itself.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 029317de-570c-42f4-b75d-8407ae4cb04f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@apfine
Copy link
Author

apfine commented Mar 11, 2026

@EwoutH @wang-boyu @khushiiagrawal

Please review and give some feedbacks.

response_format=DecisionOutput,
)

formatted_response = json.loads(rsp.choices[0].message.content)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_format=DecisionOutput is already passed to the llm call. Parsing json again here ties this code to the raw response format. It might be cleaner to rely on the wrapper's structured output instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankyou for the review !!

I will revert back after doing necessary changes .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@souro26 I have tried to resolve your flagged issues.

Please review !!

"""

def get_decision_prompt(self, obs: Observation) -> list[str]:
prompt_list = [self.agent.memory.get_prompt_ready()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the memory backend always has get_prompt_ready and get_communication_history. Adding a small guard here would make this reasoning class safer with other memory implementations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankyou for the review !!

I will revert back after doing necessary changes .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@souro26 , I have tried to resolve the issues !!

Please review these changes !!

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 93.10345% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.85%. Comparing base (a719dac) to head (d77fc0f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
mesa_llm/recording/simulation_recorder.py 92.85% 5 Missing ⚠️
mesa_llm/reasoning/decision.py 94.28% 4 Missing ⚠️
mesa_llm/recording/record_model.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
+ Coverage   90.64%   90.85%   +0.20%     
==========================================
  Files          19       20       +1     
  Lines        1540     1673     +133     
==========================================
+ Hits         1396     1520     +124     
- Misses        144      153       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants