feat : Structured llm reasoning added for auditable and constrained decision making by apfine · Pull Request #188 · mesa/mesa-llm

apfine · 2026-03-11T20:21:56Z

Summary

This PR adds a new structured reasoning strategy, DecisionReasoning, on the cot branch.

The change does not modify the existing CoTReasoning implementation. Instead, it introduces a separate reasoning module for cases where the model should make decisions through a strict, machine-readable schema rather than free-form chain-of-thought text.

The new reasoning path:

gathers memory and observation context
asks the LLM for a structured decision object
validates that object with Pydantic
stores the structured decision in memory
executes only the selected next_action

What This PR Adds

New reasoning module

Added:

mesa_llm/reasoning/decision.py

This module defines:

DecisionOption
DecisionOutput
DecisionReasoning

New reasoning schema

The model is required to return a strict JSON object containing:

goal
constraints
known_facts
unknowns
assumptions
options
chosen_option
rationale
confidence
risks
next_action

Each option also contains:

name
description
tradeoffs
score

Validation

The structured response is validated through Pydantic before execution.

This gives:

a stable output shape
explicit required fields
bounded confidence in the range 0.0 to 1.0
safer downstream use of model output

Execution flow

The existing tool execution pipeline is preserved.

The flow is now:

Build prompt context from memory and current observation
Generate a structured decision object
Store the decision as memory
Extract next_action
Execute next_action through the existing tool executor

This means the executor no longer depends on a long reasoning blob for this reasoning mode.

Memory integration

The decision artifact is stored as:

type="decision"

This makes the reasoning result easier to inspect and supports future analysis of:

confidence
risks
assumptions
option selection quality

Tests

Added:

tests/test_reasoning/test_decision.py

The test coverage includes:

schema model creation
prompt generation
sync planning
async planning
selected tool propagation
prompt fallback behavior
no-prompt error handling
execution of next_action
display metadata behavior

What This PR Does Not Change

This PR does not modify:

mesa_llm/reasoning/cot.py
CoTReasoning
ReActReasoning
ReWOOReasoning

So while this branch is named cot, the latest change is not an update to the existing CoT implementation. It is a new alternative reasoning mode focused on structured decision-making.

Why This Change

The current reasoning styles in the repo support:

free-form chain-of-thought
lightweight reasoning + action
multi-step planning

What was missing was a reasoning mode that explicitly separates:

facts
unknowns
assumptions
options
tradeoffs
confidence
risks
final executable action

This is useful when decision quality and inspectability matter more than verbose reasoning prose.

Example Output

{
  "goal": "Reach food",
  "constraints": ["One move this turn"],
  "known_facts": ["Food is visible to the east"],
  "unknowns": ["Whether another agent will block the path"],
  "assumptions": ["The east cell remains traversable this step"],
  "options": [
    {
      "name": "move_east",
      "description": "Move toward visible food",
      "tradeoffs": ["Fast progress", "Potential contention"],
      "score": 0.88
    }
  ],
  "chosen_option": "move_east",
  "rationale": "It best advances the goal with acceptable risk.",
  "confidence": 0.78,
  "risks": ["Another agent may reach the food first"],
  "next_action": "move_east"
}

Only next_action is forwarded to the executor.

Validation Performed

The following checks were run:

pytest tests/test_reasoning/test_decision.py tests/test_reasoning -q
ruff check mesa_llm tests

Results:

reasoning tests passed

Ruff : passed

only an existing upstream Mesa deprecation warning was observed during pytest

Files Added :
mesa_llm/reasoning/decision.py tests/test_reasoning/test_decision.py

Notes For Reviewers

This PR should be reviewed as:

a new reasoning capability
a structured alternative to free-form CoT
a non-breaking addition to the current reasoning system

It should not be interpreted as an enhancement to the existing CoTReasoning class itself.

coderabbitai · 2026-03-11T20:22:08Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 029317de-570c-42f4-b75d-8407ae4cb04f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

apfine · 2026-03-11T20:35:07Z

@EwoutH @wang-boyu @khushiiagrawal

Please review and give some feedbacks.

souro26 · 2026-03-12T20:29:10Z

mesa_llm/reasoning/decision.py

+            response_format=DecisionOutput,
+        )
+
+        formatted_response = json.loads(rsp.choices[0].message.content)


response_format=DecisionOutput is already passed to the llm call. Parsing json again here ties this code to the raw response format. It might be cleaner to rely on the wrapper's structured output instead.

Thankyou for the review !!

I will revert back after doing necessary changes .

@souro26 I have tried to resolve your flagged issues.

Please review !!

souro26 · 2026-03-12T20:31:10Z

mesa_llm/reasoning/decision.py

+        """
+
+    def get_decision_prompt(self, obs: Observation) -> list[str]:
+        prompt_list = [self.agent.memory.get_prompt_ready()]


This assumes the memory backend always has get_prompt_ready and get_communication_history. Adding a small guard here would make this reasoning class safer with other memory implementations.

Thankyou for the review !!

I will revert back after doing necessary changes .

@souro26 , I have tried to resolve the issues !!

Please review these changes !!

codecov · 2026-03-12T20:37:09Z

Codecov Report

❌ Patch coverage is 93.10345% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.85%. Comparing base (a719dac) to head (d77fc0f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
mesa_llm/recording/simulation_recorder.py	92.85%	5 Missing ⚠️
mesa_llm/reasoning/decision.py	94.28%	4 Missing ⚠️
mesa_llm/recording/record_model.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
+ Coverage   90.64%   90.85%   +0.20%     
==========================================
  Files          19       20       +1     
  Lines        1540     1673     +133     
==========================================
+ Hits         1396     1520     +124     
- Misses        144      153       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

souro26 reviewed Mar 12, 2026

View reviewed changes

apfine added 2 commits March 14, 2026 01:57

feat: add structured decision reasoning

9406fe9

fix : resolved issues suggested by reviewers

b2da475

apfine force-pushed the cot branch from d77fc0f to b2da475 Compare March 13, 2026 21:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

1ecf077

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat : Structured llm reasoning added for auditable and constrained decision making#188

feat : Structured llm reasoning added for auditable and constrained decision making#188
apfine wants to merge 3 commits intomesa:mainfrom
apfine:cot

apfine commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Review skipped

Uh oh!

apfine commented Mar 11, 2026

Uh oh!

souro26 Mar 12, 2026

Uh oh!

apfine Mar 13, 2026

Uh oh!

apfine Mar 13, 2026

Uh oh!

souro26 Mar 12, 2026

Uh oh!

apfine Mar 13, 2026

Uh oh!

apfine Mar 13, 2026

Uh oh!

codecov bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

apfine commented Mar 11, 2026

Summary

What This PR Adds

New reasoning module

New reasoning schema

Validation

Execution flow

Memory integration

Tests

What This PR Does Not Change

Why This Change

Example Output

Validation Performed

Notes For Reviewers

It should not be interpreted as an enhancement to the existing CoTReasoning class itself.

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

apfine commented Mar 11, 2026

Uh oh!

souro26 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

apfine Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

apfine Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

souro26 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

apfine Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

apfine Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading