feat(classic): update classic autogpt a bit to make it more useful for my day to day by ntindle · Pull Request #11797 · Significant-Gravitas/AutoGPT

ntindle · 2026-01-18T23:26:33Z

Summary

This PR modernizes AutoGPT Classic to make it more useful for day-to-day autonomous agent development. Major changes include consolidating the project structure, adding new prompt strategies, modernizing the benchmark system, and improving the development experience.

Note: AutoGPT Classic is an experimental, unsupported project preserved for educational/historical purposes. Dependencies will not be actively updated.

Changes 🏗️

Project Structure & Build System

Consolidated Poetry projects - Merged forge/, original_autogpt/, and benchmark packages into a single pyproject.toml at classic/ root
Removed old benchmark infrastructure - Deleted the complex agbenchmark package (3000+ lines) in favor of the new direct_benchmark harness
Removed frontend - Deleted benchmark/frontend/ React app (no longer needed)
Cleaned up CI workflows - Simplified GitHub Actions workflows for the consolidated project structure
Added CLAUDE.md - Documentation for working with the codebase using Claude Code

New Direct Benchmark System

direct_benchmark harness - New streamlined benchmark runner with:
- Rich TUI with multi-panel layout showing parallel test execution
- Incremental resume and selective reset capabilities
- CI mode for non-interactive environments
- Step-level logging with colored prefixes
- "Would have passed" tracking for timed-out challenges
- Copy-paste completion blocks for sharing results

Multiple Prompt Strategies

Added pluggable prompt strategy system supporting:

one_shot - Single-prompt completion
plan_execute - Plan first, then execute steps
rewoo - Reasoning without observation (deferred tool execution)
react - Reason + Act iterative loop
lats - Language Agent Tree Search (MCTS-based exploration)
sub_agent - Multi-agent delegation architecture
debate - Multi-agent debate for consensus

LLM Provider Improvements

Added support for modern Anthropic Claude models (claude-3.5-sonnet, claude-3-haiku, etc.)
Added Groq provider support
Improved tool call error feedback for LLM self-correction
Fixed deprecated API usage

Web Components

Replaced Selenium with Playwright for web browsing (better async support, faster)
Added lightweight web fetch component for simple URL fetching
Modernized web search with tiered provider system (Tavily, Serper, Google)

Agent Capabilities

Workspace permissions system - Pattern-based allow/deny lists for agent commands
Rich interactive selector for command approval with scopes (once/agent/workspace/deny)
TodoComponent with LLM-powered task decomposition
Platform blocks integration - Connect to AutoGPT Platform API for additional blocks
Sub-agent architecture - Agents can spawn and coordinate sub-agents

Developer Experience

Python 3.12+ support with CI testing on 3.12, 3.13, 3.14
Current working directory as default workspace - Run autogpt from any project directory
Simplified log format (removed timestamps)
Improved configuration and setup flow
External benchmark adapters for GAIA, SWE-bench, and AgentBench

Bug Fixes

Fixed N/A command loop when using native tool calling
Fixed auto-advance plan steps in Plan-Execute strategy
Fixed approve+feedback to execute command then send feedback
Fixed parallel tool calls in action history
Always recreate Docker containers for code execution
Various pyright type errors resolved
Linting and formatting issues fixed across codebase

Test Plan

CI lint, type, and test checks pass
Run poetry install from classic/ directory
Run poetry run autogpt and verify CLI starts
Run poetry run direct-benchmark run --tests ReadFile to verify benchmark works

Notes

This is a WIP PR for personal use improvements
The project is marked as unsupported - no active maintenance planned
Contains known vulnerabilities in dependencies (intentionally not updated)

Note

Medium Risk
CI and developer-tooling changes are broad and could unintentionally skip coverage (e.g., dropping multi-OS testing) or break installs if the consolidated classic Poetry setup diverges from reality; functional runtime behavior is largely untouched in this diff.

Overview
AutoGPT Classic’s automation is refactored to assume a single consolidated Poetry project rooted at classic/: GitHub Actions workflows drop the multi-OS matrix, standardize on Ubuntu + Python 3.12, use classic/poetry.lock for caching, and run tests/coverage from the unified layout (including adding ANTHROPIC_API_KEY and consistent MinIO setup).

Benchmark CI is rewritten to run direct-benchmark smoke tests (Read/WriteFile), category/strategy filtering checks, and a gated (master/dev) maintain/regression run; legacy agbenchmark-based workflow logic and related per-subproject lint/type plumbing are removed in favor of single classic-scoped pre-commit hooks and a consolidated classic/.flake8.

Repository hygiene/docs are updated: .gitmodules and the Classic frontend workflow are removed, Classic docs are refreshed (classic/README.md, new classic/CLAUDE.md), and .gitignore expands to cover new workspace/report artifacts (e.g., .autogpt/, benchmark workspaces, test.db) plus Claude local settings.

^{Written by Cursor Bugbot for commit b075495. This will update automatically on new commits. Configure here.}

- Add Claude 3.5 v2, Claude 4 Sonnet, Claude 4 Opus, and Claude 4.5 Opus models - Add rolling aliases (CLAUDE_SONNET, CLAUDE_OPUS, CLAUDE_HAIKU) - Fix deprecated beta.tools.messages.create API call to use standard messages.create - Update anthropic SDK from ^0.25.1 to >=0.40,<1.0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove deprecated Flutter frontend (replaced by autogpt_platform) - Remove shell scripts (run, setup, autogpt.sh, etc.) - Remove tutorials (outdated) - Remove CLI-USAGE.md and FORGE-QUICKSTART.md - Add CLAUDE.md files for Claude Code guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add --workspace option to CLI that defaults to current working directory, allowing users to run `autogpt` from any folder. Agent data is now stored in `.autogpt/` subdirectory of the workspace instead of a hardcoded path. Changes: - Add -w/--workspace CLI option to run and serve commands - Remove dependency on forge package location for PROJECT_ROOT - Update config to use workspace instead of project_root - Store agent data in .autogpt/ within workspace directory - Update pyproject.toml files with proper PyPI metadata - Fix outdated tests to match current implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The .autogpt/ directory is where AutoGPT stores agent data when running from any directory. This should not be committed to version control. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-18T23:26:41Z

Important

Review skipped

Too many files!

This PR contains 221 files, which is 71 over the limit of 150.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch make-old-work

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-18T23:26:41Z

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

codecov · 2026-01-18T23:27:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (dev@e8c50b9). Learn more about missing BASE report.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@          Coverage Diff           @@
##             dev   #11797   +/-   ##
======================================
  Coverage       ?   49.06%           
======================================
  Files          ?      176           
  Lines          ?    14235           
  Branches       ?     1624           
======================================
  Hits           ?     6985           
  Misses         ?     7060           
  Partials       ?      190

Flag	Coverage Δ
autogpt-agent	`28.36% <ø> (?)`
forge	`58.29% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add a layered permission system that controls agent command execution: - Create autogpt.yaml in .autogpt/ folder with default allow/deny rules - File operations in workspace allowed by default - Sensitive files (.env, .key, .pem) blocked by default - Dangerous shell commands (sudo, rm -rf) blocked by default - Interactive prompts for unknown commands (y=agent, Y=workspace, n=deny) - Agent-specific permissions stored in .autogpt/agents/{id}/permissions.yaml Files added: - forge/forge/config/workspace_settings.py - Pydantic models for settings - forge/forge/permissions.py - CommandPermissionManager with pattern matching Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove references to deleted files (./run, cli.py, setup.py, frontend/) from CI workflows. Replace ./run agent start with direct poetry commands to start agent servers in background. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add a task management component modeled after Claude Code's TodoWrite: - TodoItem with recursive sub_items for hierarchical task structure - todo_write: atomic list replacement with sub-items support - todo_read: retrieve current todos with nested structure - todo_clear: clear all todos - todo_decompose: use smart LLM to break down tasks into sub-steps Features: - Hierarchical task tracking with independent status per sub-item - MessageProvider shows todos in LLM context with proper indentation - DirectiveProvider adds best practices for task management - Graceful fallback when LLM provider not configured Integrates with: - original_autogpt Agent (full LLM decomposition support) - ForgeAgent (basic task tracking, no decomposition) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add 6 new utility components to expand agent functionality: - ArchiveHandlerComponent: ZIP/TAR archive operations (create, extract, list) - ClipboardComponent: In-memory clipboard for copy/paste operations - DataProcessorComponent: CSV/JSON data manipulation and analysis - HTTPClientComponent: HTTP requests (GET, POST, PUT, DELETE) - MathUtilsComponent: Mathematical calculations and statistics - TextUtilsComponent: Text processing (regex, diff, encoding, hashing) All components follow the forge component pattern with: - CommandProvider for exposing commands - DirectiveProvider for resources/best practices - Comprehensive parameter validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add specialized exception classes for better error reporting: - CodeTimeoutError: For code execution timeouts - HTTPError: For HTTP request failures with status code/URL - DataProcessingError: For JSON/CSV processing errors Each exception includes helpful hints for users. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CodeExecutorComponent: - Add timeout and env_vars parameters to execution commands - Add execute_shell_popen for streaming output - Improve error handling with CodeTimeoutError FileManagerComponent: - Add file_info, file_search, file_copy, file_move commands - Add directory_create, directory_list_tree commands - Better path validation and error messages GitOperationsComponent: - Add git_log, git_show, git_branch commands - Add git_stash, git_stash_pop, git_stash_list commands - Add git_cherry_pick, git_revert, git_reset commands - Add git_remote, git_fetch, git_pull, git_push commands UserInteractionComponent: - Add ask_multiple_choice for structured options - Add notify_user for non-blocking notifications - Add confirm_action for yes/no confirmations WebSearchComponent: - Minor error handling improvements WebSeleniumComponent: - Add get_page_content, execute_javascript commands - Add take_element_screenshot command - Add wait_for_element, scroll_page commands - Improve element interaction reliability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Environment loading: - Search for .env in multiple locations (cwd, ~/.autogpt, ~/.config/autogpt) - Allows running autogpt from any directory - Document search order in .env.template Setup simplification: - Remove interactive AI settings revision (was broken/unused) - Simplify to just printing current settings - Clean up unused imports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove asctime from log formats since terminal output already has timestamps from the logging infrastructure. Makes logs cleaner and easier to read. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update gitignore to use glob pattern for settings.local.json files in any .claude directory. Also untrack the existing file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove openai_functions config option - native tool calling is now always enabled - Remove use_functions_api from BaseAgentConfiguration and prompt strategy - Add use_prefill config to disable prefill for Anthropic (prefill + tools incompatible) - Update anthropic dependency to ^0.45.0 for tools API support - Simplify prompt strategy to always expect tool_calls from LLM response This fixes the N/A command loop bug where models would output "N/A" as a command name when function calling was disabled. With native tool calling always enabled, models are forced to pick from valid tools only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… 3.14 - Update Python version constraint from ^3.10 to ^3.12 in all pyproject.toml - Update classifiers to reflect Python 3.12, 3.13, 3.14 support - Update dependencies for Python 3.13+ compatibility: - chromadb: ^0.4.10 -> ^1.4.0 - numpy: >=1.26.0,<2.0.0 -> >=2.0.0 - watchdog: 4.0.0 -> ^6.0.0 - spacy: ^3.0.0 -> ^3.8.0 (numpy 2.x compatibility) - en-core-web-sm model: 3.7.1 -> 3.8.0 - httpx (benchmark): ^0.24.0 -> ^0.27.0 - Update tool configuration: - Black target-version: py310 -> py312 - Pyright pythonVersion: 3.10 -> 3.12 - Update Dockerfiles to use Python 3.12 - Update CI workflows to test on Python 3.12, 3.13, and 3.14 - Regenerate all poetry.lock files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implement four new prompt strategies based on research papers: - ReWOO: Reasoning Without Observation (5x token efficiency) - Plan-and-Execute: Separate planning from execution phases - Reflexion: Verbal reinforcement learning with episodic memory - Tree of Thoughts: Deliberate problem solving with tree search Each strategy extends a new BaseMultiStepPromptStrategy base class with shared utilities. Strategies are selectable via PROMPT_STRATEGY environment variable or config.prompt_strategy setting. Fix JSONSchema generation issue where Optional/Union types created anyOf schemas without direct type field - resolved by storing plan/phase state in strategy instances rather than ActionProposal. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The strategy was stuck in a loop because it tracked plan steps but never advanced them - the record_step_success() method existed but was never called by the agent's execution loop. Fix by using a _pending_step_advance flag to track when an action has been proposed. On the next parse_response_content() call, advance the previous step before processing the new response. This keeps step tracking self-contained in the strategy without requiring agent changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds a custom Rich-based interactive selector for the command approval workflow. Features include: - Arrow key navigation for selecting approval options - Tab to add context to any selection (e.g., "Once + also check file x") - Dedicated inline feedback option with shadow placeholder text - Quick select with number keys 1-5 - Works within existing asyncio event loop (no prompt_toolkit dependency) Also adds UIProvider abstraction pattern for future UI implementations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-02-05T07:12:46Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

...benchmark/challenges/verticals/code/2_password_generator/artifacts_out/password_generator.py

Some LLM providers (notably Anthropic) don't support system messages in the middle of a conversation. Changed ChatMessage.system() to ChatMessage.user() for all mid-conversation context messages across components (action history, context, skills, system clock, todo, error reporting, LATS, and multi-agent debate strategies). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-11T19:11:12Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

# Conflicts: # .github/workflows/classic-frontend-ci.yml # .gitignore # classic/frontend/.gitignore

github-actions · 2026-02-11T19:19:16Z

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

- Add 'playwright install chromium' step to Forge CI workflow - Auto-detect default model from available API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY) in direct_benchmark harness - Prefer Claude > OpenAI > Groq, fallback to OpenAI if no keys found

…ting is_new

greptile-apps · 2026-02-13T03:38:20Z

Too many files changed for review. (2256 files found, 100 file limit)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-13T03:50:10Z

classic/Dockerfile.autogpt

 # Include forge so it can be used as a path dependency
 COPY forge/ ../forge

-# Include frontend


Dockerfile references removed pyproject.toml and poetry.lock files

High Severity

The COPY original_autogpt/pyproject.toml original_autogpt/poetry.lock ./ instruction references files that no longer exist. The PR consolidates all Poetry projects into a single classic/pyproject.toml, removing the per-subpackage pyproject.toml and poetry.lock files. The Dockerfile was partially updated (Python 3.12, frontend removal) but this COPY instruction still points to the old, now-nonexistent files, which will cause all Docker builds to fail.

ntindle and others added 5 commits December 26, 2025 10:02

wip: add supprot for new openai models (non working)

ea521ee

chore: add .autogpt/ to gitignore

7a20de8

The .autogpt/ directory is where AutoGPT stores agent data when running from any directory. This should not be committed to version control. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-project-automation bot added this to AutoGPT development kanban Jan 18, 2026

github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Jan 18, 2026

github-actions bot changed the base branch from master to dev January 18, 2026 23:26

github-actions bot added Classic AutoGPT Agent Forge Classic Benchmark Classic Frontend Classic AutoGPT's Agent Protocol Front end size/xl labels Jan 18, 2026

ntindle and others added 14 commits January 18, 2026 17:39

refactor(classic): simplify log format by removing timestamps

8fc174c

Remove asctime from log formats since terminal output already has timestamps from the logging infrastructure. Makes logs cleaner and easier to read. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore: ignore .claude/settings.local.json in all directories

6fbd208

Update gitignore to use glob pattern for settings.local.json files in any .claude directory. Also untrack the existing file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions bot added Classic Benchmark Classic Frontend Classic AutoGPT's Agent Protocol Front end and removed Classic Benchmark Classic Frontend Classic AutoGPT's Agent Protocol Front end labels Feb 5, 2026

github-advanced-security bot found potential problems Feb 5, 2026

View reviewed changes

...benchmark/challenges/verticals/code/2_password_generator/artifacts_out/password_generator.py Dismissed Show dismissed Hide dismissed

ntindle force-pushed the make-old-work branch from 6ee7ead to f56abce Compare February 11, 2026 19:10

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Feb 11, 2026

Merge remote-tracking branch 'origin/dev' into make-old-work

ac7de17

# Conflicts: # .github/workflows/classic-frontend-ci.yml # .gitignore # classic/frontend/.gitignore

github-actions bot removed the conflicts Automatically applied to PRs with merge conflicts label Feb 11, 2026

ntindle and others added 7 commits February 12, 2026 15:46

fix(classic): fix flake8 line too long

1480183

fix(classic): skip S3 tests in CI due to MinIO compatibility issues

24b38f2

fix(classic): add ANTHROPIC_API_KEY to AutoGPT CI workflow

053b92e

fix(classic): use tmp_path for bulletin tests instead of hardcoded paths

9622ba8

fix(classic): fix bulletin test - mock web to return content when tes…

d437e75

…ting is_new

Merge branch 'dev' into make-old-work

b075495

ntindle marked this pull request as ready for review February 13, 2026 03:37

ntindle requested review from a team as code owners February 13, 2026 03:37

ntindle requested review from Bentlybro and Swiftyos and removed request for a team February 13, 2026 03:37

ntindle changed the title ~~[wip] update classic autogpt a bit to make it more useful for my day to day~~ feat(classic): update classic autogpt a bit to make it more useful for my day to day Feb 13, 2026

cursor bot reviewed Feb 13, 2026

View reviewed changes

Swiftyos removed their request for review February 13, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(classic): update classic autogpt a bit to make it more useful for my day to day#11797

feat(classic): update classic autogpt a bit to make it more useful for my day to day#11797
ntindle wants to merge 83 commits intodevfrom
make-old-work

ntindle commented Jan 18, 2026 •

edited by cursor bot

Loading

Uh oh!

coderabbitai bot commented Jan 18, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

codecov bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ntindle commented Jan 18, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes 🏗️

Project Structure & Build System

New Direct Benchmark System

Multiple Prompt Strategies

LLM Provider Improvements

Web Components

Agent Capabilities

Developer Experience

Bug Fixes

Test Plan

Notes

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

codecov bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 13, 2026

Choose a reason for hiding this comment

Dockerfile references removed pyproject.toml and poetry.lock files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

ntindle commented Jan 18, 2026 •

edited by cursor bot

Loading

coderabbitai bot commented Jan 18, 2026 •

edited

Loading

codecov bot commented Jan 18, 2026 •

edited

Loading