SAG (Setup-Agent)

🤖 An LLM-Powered Engine for Automated Project Setup & Configuration 🤖

SAG (Setup-Agent) is an advanced AI agent designed to fully automate the initial setup, configuration, and ongoing tasks for any software project. It operates within an isolated Docker environment, intelligently interacting with project files, shell commands, and web resources to transform hours—or even days—of manual setup into a process that takes just a few minutes.

🔦 Highlights

Container-native execution powered by docker_orch/, ensuring each project is built inside an isolated Docker workspace with disposable volumes.
Dual-model ReAct loop in agent/react_engine.py with live token telemetry from agent/token_tracker.py and agent-state evaluation for resilient planning.
Fact-based validation through agent/physical_validator.py, combining build artifact inspection with the structured test catalog in testcases/catalog.py for accurate pass/fail reporting.
Rich automation toolkit (tools/) spanning foundational helpers (bash, file_io, context_tool, web_search) and higher-level orchestrators like project_analyzer, report_tool, output_search_tool, and language-specific build runners that translate intent into reproducible actions.
Centralized diagnostics via agent/error_logger.py and agent/output_storage.py, making every action, log, and generated report searchable across runs.

📖 Philosophy: Solving the "Getting Started" Problem

In software development, configuring a new project—especially a large open-source one—is often a tedious, time-consuming, and error-prone task. Developers must read extensive documentation, resolve dependency conflicts, and understand the project's structure before writing a single line of effective code.

SAG's core mission is to solve this problem. It aims to be an intelligent "Project Initialization Specialist" by adhering to these core principles:

Complete Isolation: All operations occur within Docker containers, ensuring the host machine is never polluted. This guarantees a clean, reproducible setup every time.
Intelligent Planning & Execution: SAG doesn't just execute commands; it analyzes, plans (by creating a TODO list), and systematically solves problems like a human expert.
Hierarchical Context Management: With an innovative "Trunk/Branch" context system, SAG can handle complex task chains, from high-level goals to low-level operations, without getting lost in long-running processes.
Dual-Model Collaboration: It leverages the strengths of different LLMs—one for deep thinking and planning (e.g., o1-preview) and another for fast action and tool use (e.g., gpt-4o)—to achieve a balance of efficiency and effectiveness.

✨ Core Concepts

1. Dual-Model ReAct Engine

The "brain" of SAG is an enhanced ReAct (Reasoning-Acting) engine. By separating the "thinking" and "acting" phases and using different models for each, it achieves more effective decision-making:

Thinking Model: Responsible for analyzing complex problems, creating high-level plans, and learning from errors. It thinks deeper and sees further. Supports advanced reasoning features like OpenAI's o1 models and Anthropic's Claude thinking capabilities.
Action Model: Responsible for precisely executing the plan laid out by the thinking model, whether that's calling a tool, generating code, or running a command. It's focused on "doing."

This architecture allows SAG to be both thoughtful and agile when tackling unfamiliar projects.

2. Hierarchical Context Management (Trunk & Branch)

To solve the problem of context loss in complex tasks, SAG implements a hierarchical context management system:

Trunk Context:
- Stores the project's overall goal, the complete TODO list, and high-level progress.
- Acts as the "command center" for the entire setup task.
Branch Context:
- Created for each specific task on the TODO list.
- Contains all the details, logs, and the current focus for that sub-task.
- The agent works within a branch context to solve one problem at a time before returning to the trunk.

This system enables SAG to switch between high-level planning and low-level execution seamlessly, ensuring stability and coherence throughout long-running tasks.

3. Hierarchical Tool-belt Design

At first glance, a single bash tool could handle all system interactions. However, relying solely on a low-level tool would force the AI agent to manage immense complexity, from remembering command syntax to parsing raw text output—a process that is both inefficient and error-prone.

SAG adopts a hierarchical tool-belt design to address this, creating layers of abstraction that empower the agent to work more intelligently:

Low-Level Foundational Tools: At the base is the BashTool. It provides unrestricted, granular control, much like an assembly language for system operations. It is the ultimate fallback for tasks that have no specialized tool.
Mid-Level Specialized Tools: These tools encapsulate domain-specific knowledge. For example:
- SystemTool understands system package management (apt-get), abstracting away the need to manually form install or update commands.
- MavenTool is an expert in Java's Maven build system. It knows about goals, profiles, and properties, and can intelligently parse Maven's verbose output to determine if a build succeeded, failed, or had test errors.
High-Level Workflow Tools: At the top layer, tools like ProjectSetupTool orchestrate complex, multi-step workflows. Its clone action doesn't just run git clone; it also automatically detects the project type (Maven, Node.js, Python), suggests the next appropriate actions, and can even trigger dependency installation, compressing a long chain of human-like reasoning into a single, intent-driven command.

This layered approach allows the agent to delegate complexity. Instead of figuring out how to do something with basic commands, it can focus on what it needs to achieve, leading to faster, more reliable, and more sophisticated automation.

🏗️ System Architecture

SAG is composed of several core components:

CLI (main.py): Entry point for user commands such as project, run, and list, with optional artifact recording for post-run inspection.
Configuration Layer (config/): Loads .env settings, provider credentials, and model presets, and wires logging streams used across the agent.
Setup Agent & Contexts (agent/agent.py, agent/context_manager.py): Orchestrates the workflow, persists trunk/branch contexts inside the container workspace, and initializes the full tool-suite.
ReAct Engine & State Evaluation (agent/react_engine.py, agent/agent_state_evaluator.py): Dual-model reasoning loop with completion sign detection, retry guards, and live token telemetry.
Physical Validation & Diagnostics (agent/physical_validator.py, agent/output_storage.py, agent/error_logger.py, agent/token_tracker.py): Verifies build/test results from artifacts, snapshots execution outputs, and centralizes structured error logs.
Automation Tool-belt (tools/): Intent-driven helpers including project_analyzer, project_setup_tool, gradle_tool, maven_tool, system_tool, and the cross-run output_search_tool.
Reporting & Test Intelligence (tools/report_tool.py, testcases/catalog.py, reporting/): Generates markdown setup reports, merges runtime and static test metadata, and tracks annotation-level metrics.
Docker Orchestrator (docker_orch/orch.py): Guarantees container lifecycle management, volume persistence, and shell connectivity for every project.
Knowledge Base & Regression Suites (docs/, examples/, tests/, student_tasks/): Documentation and executable scenarios that codify best practices and guard against regressions.

🧠 Advanced Automation Tooling

Foundation Utilities (tools/bash.py, tools/file_io.py, tools/context_tool.py, tools/web_search.py): Provide container-aware shell execution, safe file interactions, context updates, and web lookups for knowledge gaps.
Project Analyzer (tools/project_analyzer.py): Performs static scanning to classify project types, detect build tools, and enumerate unit, integration, and parameterized tests before execution.
Report Tool (tools/report_tool.py): Produces setup-report-*.md summaries that combine physical validator evidence, execution history, and actionable remediation guidance.
Output Search Tool (tools/output_search_tool.py): Indexes prior observations and context files to let the agent recall log snippets or stack traces across iterations.
Build & Dependency Helpers (tools/maven_tool.py, tools/gradle_tool.py, tools/system_tool.py): Provide domain-specific command wrappers with structured output parsing and retry logic.
Project Setup Tool (tools/project_setup_tool.py): Encapsulates cloning, environment bootstrapping, and language-specific install steps into a single orchestrated action.

✅ Validation & Observability

Physical Validator (agent/physical_validator.py): Inspects build artifacts, XML test reports, and compilation timestamps to ground decisions in physical evidence.
Test Case Catalog (testcases/catalog.py): Normalizes runtime results, parameterized expansions, and Groovy/Kotlin discovery to keep counts consistent across tools.
Error Logger (agent/error_logger.py): Aggregates issues from every tool call with canonical error codes, making retrospectives and reports actionable.
Output Storage Manager (agent/output_storage.py): Persists observations, plan snapshots, and generated artifacts under .setup_agent/ inside the container workspace.
Token Tracker (agent/token_tracker.py): Captures prompt, completion, and reasoning token usage per ReAct step for cost and performance analysis.

🧭 End-to-End Flow

flowchart TD
    CLI["CLI (`main.py`)<br/>`sag project` / `sag run`"]
    Config["Load configuration & session logging<br/>`config/__init__.py`"]
    Docker["Docker orchestrator provisions container + volume<br/>`docker_orch/orch.py`"]
    AgentInit["SetupAgent constructed<br/>`agent/agent.py`"]
    ToolsInit["Context & tool initialization<br/>• ErrorLogger (`agent/error_logger.py`)<br/>• ContextManager + `manage_context` tool<br/>• Bash/FileIO/WebSearch helpers<br/>• ProjectSetup/Analyzer/System/Maven/Gradle tools<br/>• OutputStorageManager + OutputSearch<br/>• ReportTool + PhysicalValidator"]
    Trunk["Create trunk context & TODO list<br/>Tasks persisted via ContextManager"]
    LoopStart["Kick off ReAct loop<br/>`agent/react_engine.py`"]

    subgraph ReActIteration[ReAct Iteration]
        Think["THOUGHT: Thinking model plans next step<br/>TokenTracker absorbs usage"]
        ActionDecision{Action needed?}
        Act["ACTION: Action model decides tool call"]
        CtxSwitch["manage_context updates trunk↔branch state<br/>Branch history written to .setup_agent/contexts"]
        ToolExec["Domain tools run inside container<br/>• `project_analyzer`, `maven`, `gradle`, `project_setup`, `system`, etc."]
        OutputPersist["Observations shortened for context<br/>Full logs stored via OutputStorageManager"]
        Observation["OBSERVATION: Summaries + ref IDs recorded"]
        StateEval["AgentStateEvaluator + PhysicalValidator validate progress<br/>Detect completion, retries, or missing evidence"]
        Guidance{Guidance or retry needed?}
    end

    Report["ReportTool generates `setup-report-*.md`<br/>Combines execution history + physical validation"]
    Completion["Setup summary & optional artifact export<br/>`_provide_setup_summary`, `_save_setup_artifacts`"]
    OutputSearch["When logs exceed context, `output_search` fetches by ref ID
for follow-up reasoning"]

    CLI --> Config --> Docker --> AgentInit --> ToolsInit --> Trunk --> LoopStart
    LoopStart --> Think --> ActionDecision
    ActionDecision -- "No (keep thinking)" --> Think
    ActionDecision -- "Yes" --> Act --> ToolExec --> OutputPersist --> Observation --> StateEval
    Act --> CtxSwitch
    CtxSwitch --> OutputPersist
    OutputPersist --> OutputSearch
    OutputSearch --> Think
    StateEval --> Guidance
    Guidance -- "Request guidance" --> Think
    Guidance -- "No guidance" --> ActionDecision
    StateEval -->|"Success"| Report --> Completion
    StateEval -->|"More work"| Think

This flow illustrates how the CLI bootstraps an isolated Docker workspace, how SetupAgent wires every tool, and how the ReAct loop alternates thinking and action. manage_context safeguards trunk/branch transitions, OutputStorage captures verbose tool outputs (with output_search replaying them on demand), and AgentStateEvaluator works with PhysicalValidator to gate completion before ReportTool finalizes the run.

🚀 Quick Start

1. Prerequisites

Docker
Python 3.10+
uv (The recommended Python package manager)

2. Installation & Configuration

# (Optional) Install uv globally if it is not already available
pip install uv

# 1. Clone the repository
git clone https://github.com/your-org/Setup-Agent.git
cd Setup-Agent

# 2. Install dependencies with uv (this will also create a virtual environment)
uv sync

# 3. Create and edit your configuration file
cp .env.example .env
nano .env  # Fill in your API keys and other settings

3. Basic Usage

# Start setting up a new project
uv run sag project https://github.com/fastapi/fastapi.git

# List all managed projects and their status
uv run sag list

# Run a new task on an existing project
uv run sag run sag-fastapi --task "add a new endpoint to handle /healthz"

# Access the project container's shell
uv run sag shell sag-fastapi

# Remove a project (including its container and volume)
uv run sag remove sag-fastapi

🛠️ CLI Command Reference

SAG provides a clean and powerful set of CLI commands.

Command	Description	Example
`sag project <url>`	Initializes the setup for a new project from a Git repository URL.	`sag project https://github.com/pallets/flask.git`
`sag list`	Lists all projects managed by SAG, showing their container name, status, and last comment.	`sag list`
`sag run <name>`	Runs a specified task on an existing project.	`sag run sag-flask --task "add unit tests for the application factory"`
`sag shell <name>`	Connects to an interactive shell inside the specified project's container.	`sag shell sag-flask`
`sag remove <name>`	Permanently deletes a project, including its container and data volume.	`sag remove sag-flask --force`
`sag version`	Displays SAG's version information.	`sag version`
`sag --help`	Shows the help message.	`sag --help`

Global Options:

--log-level [DEBUG|INFO|...]: Overrides the log level set in the .env file.
--log-file <path>: Specifies a custom path for the log file.

✅ Running Tests Locally

# Run the full pytest suite (integration + smoke tests)
uv run pytest

# Or execute a focused scenario for faster feedback
uv run pytest test_report_format.py

⚙️ Configuration Explained

All configuration is managed through the .env file in the project's root directory.

Key Configuration Options:

SAG_THINKING_MODEL: The "thinking model" for planning and analysis. A powerful model is recommended (e.g., o1-preview, claude-sonnet-4-20250514).
SAG_ACTION_MODEL: The "action model" for task execution. A fast and cost-effective model is recommended (e.g., gpt-4o, claude-3-5-sonnet-20240620).
SAG_THINKING_PROVIDER: The provider for the thinking model (openai, anthropic, etc.).
SAG_REASONING_EFFORT: For thinking models, controls reasoning depth (low, medium, high).
SAG_THINKING_BUDGET_TOKENS: For Claude models, controls thinking budget (1024, 2048, 4096).
OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.: API keys for the respective LLM providers.
SAG_LOG_LEVEL: Sets the logging verbosity. DEBUG mode is highly detailed and includes LiteLLM's internal logs.
SAG_MAX_ITERATIONS: The maximum number of iterations for a single run or project command to prevent infinite loops.

🔍 How It Works: A Look Under the Hood

When you run sag project <url>, a sophisticated sequence of events is triggered:

Environment Initialization: SAG's Docker Orchestrator spins up an isolated Docker container and a persistent data volume.
Project Cloning: The agent uses its bash tool inside the container to clone the specified Git repository.
Context Establishment: A Trunk Context is created with the high-level goal (e.g., "Set up this project to be runnable").
Intelligent Analysis & Planning: The Thinking Model is engaged to analyze the project structure (e.g., README.md, package.json, pyproject.toml) and generate a comprehensive TODO list, which is stored in the Trunk Context.
Task Loop Initiation: a. The agent picks the first task from the Trunk Context's TODO list. b. A Branch Context is created to focus exclusively on this task (e.g., "Install project dependencies"). c. The Action Model executes the necessary steps within the Branch Context (e.g., runs npm install). d. The result is observed. If an error occurs, the Thinking Model is re-engaged to analyze the cause and find a solution (e.g., using the web_search tool to look up the error message). e. Once the task is complete, the agent records a summary in the Trunk Context, marks the task as "completed," and destroys the Branch Context.
Rinse and Repeat: Step 5 is repeated until all tasks in the TODO list are completed.
Completion: The agent exits, leaving behind a fully configured and runnable project environment in the Docker container.

🎯 Use Cases

Rapid Prototyping: Set up and run any open-source project in minutes to evaluate its suitability.
Standardized Dev Environments: Create consistent, one-click development environments for team members.
CI/CD Automation: Automate complex project setups and testing environments in your CI pipelines.
Learning New Technologies: Quickly get hands-on with an unfamiliar framework or stack by letting SAG handle the setup.
Secure Experimentation: Safely test unfamiliar or untrusted code in an isolated sandbox.

🤝 Contributing

We warmly welcome contributions of all kinds! Whether it's a bug report, a feature suggestion, or a pull request, your help is invaluable to the project.

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAG (Setup-Agent)

🔦 Highlights

📖 Philosophy: Solving the "Getting Started" Problem

✨ Core Concepts

1. Dual-Model ReAct Engine

2. Hierarchical Context Management (Trunk & Branch)

3. Hierarchical Tool-belt Design

🏗️ System Architecture

🧠 Advanced Automation Tooling

✅ Validation & Observability

🧭 End-to-End Flow

🚀 Quick Start

1. Prerequisites

2. Installation & Configuration

3. Basic Usage

🛠️ CLI Command Reference

✅ Running Tests Locally

⚙️ Configuration Explained

🔍 How It Works: A Look Under the Hood

🎯 Use Cases

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.vscode		.vscode
agent		agent
config		config
docker_orch		docker_orch
docs		docs
examples		examples
reporting		reporting
testcases		testcases
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

License

Codegass/Setup-Agent

Folders and files

Latest commit

History

Repository files navigation

SAG (Setup-Agent)

🔦 Highlights

📖 Philosophy: Solving the "Getting Started" Problem

✨ Core Concepts

1. Dual-Model ReAct Engine

2. Hierarchical Context Management (Trunk & Branch)

3. Hierarchical Tool-belt Design

🏗️ System Architecture

🧠 Advanced Automation Tooling

✅ Validation & Observability

🧭 End-to-End Flow

🚀 Quick Start

1. Prerequisites

2. Installation & Configuration

3. Basic Usage

🛠️ CLI Command Reference

✅ Running Tests Locally

⚙️ Configuration Explained

🔍 How It Works: A Look Under the Hood

🎯 Use Cases

🤝 Contributing

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages