Autonomous reasoning playground featuring reflection graphs and multi-agent orchestration.
- Modular
corepackage with reusable agent abstractions, memory, and tool registry. - Two sample projects:
autoreason: reflection loop with optional Streamlit UI.orchestrator: multi-agent workflow connecting researcher, coder, critic, and summarizer personas.
- Evaluation harness with metrics to track coherence, reflection depth, and confidence.
- Reproducible configuration via YAML files and seeded randomness.
- Ready-to-run demo notebooks for both projects.
- Create and activate a Python 3.10+ environment.
- Install the project in editable mode:
pip install -e .[notebooks]
- Copy
.env.exampleto.envand provide provider credentials if required. - Run the unit tests:
make test - Launch the Streamlit UI (optional):
streamlit run src/autoreason/app.py
auto-reason/
├── src/
│ ├── core/ # shared abstractions
│ ├── autoreason/ # reflection loop project
│ ├── orchestrator/ # multi-agent orchestrator
│ └── evals/ # metrics and harness
├── configs/ # YAML configuration files
├── notebooks/ # interactive demos
├── tests/ # pytest suite
└── Makefile # developer utilities
Use the evaluation harness to score graphs against scenarios:
from pathlib import Path
from src.evals.harness import EvalHarness, EvalScenario
from src.autoreason.graph import build_graph_from_config
from src.autoreason.app import DemoAgent
from src.core.base_agent import AgentConfig
harness = EvalHarness(lambda: build_graph_from_config(
agent=DemoAgent(AgentConfig(name="eval-agent")),
memory=None,
tools=None,
config_path=Path("src/autoreason/config.yaml"),
))
scenario = EvalScenario(
name="reflection",
prompt="Analyse the trade-offs of self-reflection in agents.",
expectations=["reflection", "agent"],
)
print(harness.evaluate(scenario))- Adhere to the guidelines in
AGENTS.mdand keep documentation up to date. - Add or update tests alongside code changes.
- Run
make lintandmake testbefore opening a PR.