Skip to content

feat(classic): update classic autogpt a bit to make it more useful for my day to day#11797

Open
ntindle wants to merge 83 commits intodevfrom
make-old-work
Open

feat(classic): update classic autogpt a bit to make it more useful for my day to day#11797
ntindle wants to merge 83 commits intodevfrom
make-old-work

Conversation

@ntindle
Copy link
Member

@ntindle ntindle commented Jan 18, 2026

Summary

This PR modernizes AutoGPT Classic to make it more useful for day-to-day autonomous agent development. Major changes include consolidating the project structure, adding new prompt strategies, modernizing the benchmark system, and improving the development experience.

Note: AutoGPT Classic is an experimental, unsupported project preserved for educational/historical purposes. Dependencies will not be actively updated.

Changes 🏗️

Project Structure & Build System

  • Consolidated Poetry projects - Merged forge/, original_autogpt/, and benchmark packages into a single pyproject.toml at classic/ root
  • Removed old benchmark infrastructure - Deleted the complex agbenchmark package (3000+ lines) in favor of the new direct_benchmark harness
  • Removed frontend - Deleted benchmark/frontend/ React app (no longer needed)
  • Cleaned up CI workflows - Simplified GitHub Actions workflows for the consolidated project structure
  • Added CLAUDE.md - Documentation for working with the codebase using Claude Code

New Direct Benchmark System

  • direct_benchmark harness - New streamlined benchmark runner with:
    • Rich TUI with multi-panel layout showing parallel test execution
    • Incremental resume and selective reset capabilities
    • CI mode for non-interactive environments
    • Step-level logging with colored prefixes
    • "Would have passed" tracking for timed-out challenges
    • Copy-paste completion blocks for sharing results

Multiple Prompt Strategies

Added pluggable prompt strategy system supporting:

  • one_shot - Single-prompt completion
  • plan_execute - Plan first, then execute steps
  • rewoo - Reasoning without observation (deferred tool execution)
  • react - Reason + Act iterative loop
  • lats - Language Agent Tree Search (MCTS-based exploration)
  • sub_agent - Multi-agent delegation architecture
  • debate - Multi-agent debate for consensus

LLM Provider Improvements

  • Added support for modern Anthropic Claude models (claude-3.5-sonnet, claude-3-haiku, etc.)
  • Added Groq provider support
  • Improved tool call error feedback for LLM self-correction
  • Fixed deprecated API usage

Web Components

  • Replaced Selenium with Playwright for web browsing (better async support, faster)
  • Added lightweight web fetch component for simple URL fetching
  • Modernized web search with tiered provider system (Tavily, Serper, Google)

Agent Capabilities

  • Workspace permissions system - Pattern-based allow/deny lists for agent commands
  • Rich interactive selector for command approval with scopes (once/agent/workspace/deny)
  • TodoComponent with LLM-powered task decomposition
  • Platform blocks integration - Connect to AutoGPT Platform API for additional blocks
  • Sub-agent architecture - Agents can spawn and coordinate sub-agents

Developer Experience

  • Python 3.12+ support with CI testing on 3.12, 3.13, 3.14
  • Current working directory as default workspace - Run autogpt from any project directory
  • Simplified log format (removed timestamps)
  • Improved configuration and setup flow
  • External benchmark adapters for GAIA, SWE-bench, and AgentBench

Bug Fixes

  • Fixed N/A command loop when using native tool calling
  • Fixed auto-advance plan steps in Plan-Execute strategy
  • Fixed approve+feedback to execute command then send feedback
  • Fixed parallel tool calls in action history
  • Always recreate Docker containers for code execution
  • Various pyright type errors resolved
  • Linting and formatting issues fixed across codebase

Test Plan

  • CI lint, type, and test checks pass
  • Run poetry install from classic/ directory
  • Run poetry run autogpt and verify CLI starts
  • Run poetry run direct-benchmark run --tests ReadFile to verify benchmark works

Notes

  • This is a WIP PR for personal use improvements
  • The project is marked as unsupported - no active maintenance planned
  • Contains known vulnerabilities in dependencies (intentionally not updated)

Note

Medium Risk
CI and developer-tooling changes are broad and could unintentionally skip coverage (e.g., dropping multi-OS testing) or break installs if the consolidated classic Poetry setup diverges from reality; functional runtime behavior is largely untouched in this diff.

Overview
AutoGPT Classic’s automation is refactored to assume a single consolidated Poetry project rooted at classic/: GitHub Actions workflows drop the multi-OS matrix, standardize on Ubuntu + Python 3.12, use classic/poetry.lock for caching, and run tests/coverage from the unified layout (including adding ANTHROPIC_API_KEY and consistent MinIO setup).

Benchmark CI is rewritten to run direct-benchmark smoke tests (Read/WriteFile), category/strategy filtering checks, and a gated (master/dev) maintain/regression run; legacy agbenchmark-based workflow logic and related per-subproject lint/type plumbing are removed in favor of single classic-scoped pre-commit hooks and a consolidated classic/.flake8.

Repository hygiene/docs are updated: .gitmodules and the Classic frontend workflow are removed, Classic docs are refreshed (classic/README.md, new classic/CLAUDE.md), and .gitignore expands to cover new workspace/report artifacts (e.g., .autogpt/, benchmark workspaces, test.db) plus Claude local settings.

Written by Cursor Bugbot for commit b075495. This will update automatically on new commits. Configure here.

ntindle and others added 5 commits December 26, 2025 10:02
- Add Claude 3.5 v2, Claude 4 Sonnet, Claude 4 Opus, and Claude 4.5 Opus models
- Add rolling aliases (CLAUDE_SONNET, CLAUDE_OPUS, CLAUDE_HAIKU)
- Fix deprecated beta.tools.messages.create API call to use standard messages.create
- Update anthropic SDK from ^0.25.1 to >=0.40,<1.0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove deprecated Flutter frontend (replaced by autogpt_platform)
- Remove shell scripts (run, setup, autogpt.sh, etc.)
- Remove tutorials (outdated)
- Remove CLI-USAGE.md and FORGE-QUICKSTART.md
- Add CLAUDE.md files for Claude Code guidance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --workspace option to CLI that defaults to current working directory,
allowing users to run `autogpt` from any folder. Agent data is now stored
in `.autogpt/` subdirectory of the workspace instead of a hardcoded path.

Changes:
- Add -w/--workspace CLI option to run and serve commands
- Remove dependency on forge package location for PROJECT_ROOT
- Update config to use workspace instead of project_root
- Store agent data in .autogpt/ within workspace directory
- Update pyproject.toml files with proper PyPI metadata
- Fix outdated tests to match current implementation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The .autogpt/ directory is where AutoGPT stores agent data when running
from any directory. This should not be committed to version control.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

Important

Review skipped

Too many files!

This PR contains 221 files, which is 71 over the limit of 150.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch make-old-work

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot changed the base branch from master to dev January 18, 2026 23:26
@codecov
Copy link

codecov bot commented Jan 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (dev@e8c50b9). Learn more about missing BASE report.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@          Coverage Diff           @@
##             dev   #11797   +/-   ##
======================================
  Coverage       ?   49.06%           
======================================
  Files          ?      176           
  Lines          ?    14235           
  Branches       ?     1624           
======================================
  Hits           ?     6985           
  Misses         ?     7060           
  Partials       ?      190           
Flag Coverage Δ
autogpt-agent 28.36% <ø> (?)
forge 58.29% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ntindle and others added 14 commits January 18, 2026 17:39
Add a layered permission system that controls agent command execution:

- Create autogpt.yaml in .autogpt/ folder with default allow/deny rules
- File operations in workspace allowed by default
- Sensitive files (.env, .key, .pem) blocked by default
- Dangerous shell commands (sudo, rm -rf) blocked by default
- Interactive prompts for unknown commands (y=agent, Y=workspace, n=deny)
- Agent-specific permissions stored in .autogpt/agents/{id}/permissions.yaml

Files added:
- forge/forge/config/workspace_settings.py - Pydantic models for settings
- forge/forge/permissions.py - CommandPermissionManager with pattern matching

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove references to deleted files (./run, cli.py, setup.py, frontend/)
from CI workflows. Replace ./run agent start with direct poetry commands
to start agent servers in background.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a task management component modeled after Claude Code's TodoWrite:
- TodoItem with recursive sub_items for hierarchical task structure
- todo_write: atomic list replacement with sub-items support
- todo_read: retrieve current todos with nested structure
- todo_clear: clear all todos
- todo_decompose: use smart LLM to break down tasks into sub-steps

Features:
- Hierarchical task tracking with independent status per sub-item
- MessageProvider shows todos in LLM context with proper indentation
- DirectiveProvider adds best practices for task management
- Graceful fallback when LLM provider not configured

Integrates with:
- original_autogpt Agent (full LLM decomposition support)
- ForgeAgent (basic task tracking, no decomposition)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 6 new utility components to expand agent functionality:

- ArchiveHandlerComponent: ZIP/TAR archive operations (create, extract, list)
- ClipboardComponent: In-memory clipboard for copy/paste operations
- DataProcessorComponent: CSV/JSON data manipulation and analysis
- HTTPClientComponent: HTTP requests (GET, POST, PUT, DELETE)
- MathUtilsComponent: Mathematical calculations and statistics
- TextUtilsComponent: Text processing (regex, diff, encoding, hashing)

All components follow the forge component pattern with:
- CommandProvider for exposing commands
- DirectiveProvider for resources/best practices
- Comprehensive parameter validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add specialized exception classes for better error reporting:
- CodeTimeoutError: For code execution timeouts
- HTTPError: For HTTP request failures with status code/URL
- DataProcessingError: For JSON/CSV processing errors

Each exception includes helpful hints for users.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CodeExecutorComponent:
- Add timeout and env_vars parameters to execution commands
- Add execute_shell_popen for streaming output
- Improve error handling with CodeTimeoutError

FileManagerComponent:
- Add file_info, file_search, file_copy, file_move commands
- Add directory_create, directory_list_tree commands
- Better path validation and error messages

GitOperationsComponent:
- Add git_log, git_show, git_branch commands
- Add git_stash, git_stash_pop, git_stash_list commands
- Add git_cherry_pick, git_revert, git_reset commands
- Add git_remote, git_fetch, git_pull, git_push commands

UserInteractionComponent:
- Add ask_multiple_choice for structured options
- Add notify_user for non-blocking notifications
- Add confirm_action for yes/no confirmations

WebSearchComponent:
- Minor error handling improvements

WebSeleniumComponent:
- Add get_page_content, execute_javascript commands
- Add take_element_screenshot command
- Add wait_for_element, scroll_page commands
- Improve element interaction reliability

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Environment loading:
- Search for .env in multiple locations (cwd, ~/.autogpt, ~/.config/autogpt)
- Allows running autogpt from any directory
- Document search order in .env.template

Setup simplification:
- Remove interactive AI settings revision (was broken/unused)
- Simplify to just printing current settings
- Clean up unused imports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove asctime from log formats since terminal output already has
timestamps from the logging infrastructure. Makes logs cleaner
and easier to read.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update gitignore to use glob pattern for settings.local.json files
in any .claude directory. Also untrack the existing file.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove openai_functions config option - native tool calling is now always enabled
- Remove use_functions_api from BaseAgentConfiguration and prompt strategy
- Add use_prefill config to disable prefill for Anthropic (prefill + tools incompatible)
- Update anthropic dependency to ^0.45.0 for tools API support
- Simplify prompt strategy to always expect tool_calls from LLM response

This fixes the N/A command loop bug where models would output "N/A" as a
command name when function calling was disabled. With native tool calling
always enabled, models are forced to pick from valid tools only.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… 3.14

- Update Python version constraint from ^3.10 to ^3.12 in all pyproject.toml
- Update classifiers to reflect Python 3.12, 3.13, 3.14 support
- Update dependencies for Python 3.13+ compatibility:
  - chromadb: ^0.4.10 -> ^1.4.0
  - numpy: >=1.26.0,<2.0.0 -> >=2.0.0
  - watchdog: 4.0.0 -> ^6.0.0
  - spacy: ^3.0.0 -> ^3.8.0 (numpy 2.x compatibility)
  - en-core-web-sm model: 3.7.1 -> 3.8.0
  - httpx (benchmark): ^0.24.0 -> ^0.27.0
- Update tool configuration:
  - Black target-version: py310 -> py312
  - Pyright pythonVersion: 3.10 -> 3.12
- Update Dockerfiles to use Python 3.12
- Update CI workflows to test on Python 3.12, 3.13, and 3.14
- Regenerate all poetry.lock files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement four new prompt strategies based on research papers:

- ReWOO: Reasoning Without Observation (5x token efficiency)
- Plan-and-Execute: Separate planning from execution phases
- Reflexion: Verbal reinforcement learning with episodic memory
- Tree of Thoughts: Deliberate problem solving with tree search

Each strategy extends a new BaseMultiStepPromptStrategy base class
with shared utilities. Strategies are selectable via PROMPT_STRATEGY
environment variable or config.prompt_strategy setting.

Fix JSONSchema generation issue where Optional/Union types created
anyOf schemas without direct type field - resolved by storing
plan/phase state in strategy instances rather than ActionProposal.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The strategy was stuck in a loop because it tracked plan steps but never
advanced them - the record_step_success() method existed but was never
called by the agent's execution loop.

Fix by using a _pending_step_advance flag to track when an action has
been proposed. On the next parse_response_content() call, advance the
previous step before processing the new response. This keeps step
tracking self-contained in the strategy without requiring agent changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a custom Rich-based interactive selector for the command approval
workflow. Features include:
- Arrow key navigation for selecting approval options
- Tab to add context to any selection (e.g., "Once + also check file x")
- Dedicated inline feedback option with shadow placeholder text
- Quick select with number keys 1-5
- Works within existing asyncio event loop (no prompt_toolkit dependency)

Also adds UIProvider abstraction pattern for future UI implementations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

@github-actions github-actions bot added Classic Benchmark Classic Frontend Classic AutoGPT's Agent Protocol Front end and removed Classic Benchmark Classic Frontend Classic AutoGPT's Agent Protocol Front end labels Feb 5, 2026
Some LLM providers (notably Anthropic) don't support system messages
in the middle of a conversation. Changed ChatMessage.system() to
ChatMessage.user() for all mid-conversation context messages across
components (action history, context, skills, system clock, todo,
error reporting, LATS, and multi-agent debate strategies).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Feb 11, 2026
@github-actions
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

# Conflicts:
#	.github/workflows/classic-frontend-ci.yml
#	.gitignore
#	classic/frontend/.gitignore
@github-actions github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Feb 11, 2026
@github-actions
Copy link
Contributor

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

@ntindle ntindle marked this pull request as ready for review February 13, 2026 03:37
@ntindle ntindle requested review from a team as code owners February 13, 2026 03:37
@ntindle ntindle requested review from Bentlybro and Swiftyos and removed request for a team February 13, 2026 03:37
@ntindle ntindle changed the title [wip] update classic autogpt a bit to make it more useful for my day to day feat(classic): update classic autogpt a bit to make it more useful for my day to day Feb 13, 2026
@greptile-apps
Copy link

greptile-apps bot commented Feb 13, 2026

Too many files changed for review. (2256 files found, 100 file limit)

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

# Include forge so it can be used as a path dependency
COPY forge/ ../forge

# Include frontend
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile references removed pyproject.toml and poetry.lock files

High Severity

The COPY original_autogpt/pyproject.toml original_autogpt/poetry.lock ./ instruction references files that no longer exist. The PR consolidates all Poetry projects into a single classic/pyproject.toml, removing the per-subpackage pyproject.toml and poetry.lock files. The Dockerfile was partially updated (Python 3.12, frontend removal) but this COPY instruction still points to the old, now-nonexistent files, which will cause all Docker builds to fail.

Fix in Cursor Fix in Web

@Swiftyos Swiftyos removed their request for review February 13, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 Needs initial review

Development

Successfully merging this pull request may close these issues.

1 participant

Comments