Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jun 4, 2025

Related GitHub Issue

Closes: #4351

Description

This PR adds an experimental flag to disable the command parameter in the attempt_completion tool. The change addresses the issue where Roo can execute commands without first verifying that the task was completed successfully, violating the principle of step-by-step verification.

Key implementation details:

  • Added a new experimental flag DISABLE_COMPLETION_COMMAND that users can enable to test the new behavior
  • Modified attemptCompletionTool.ts to check the experimental flag before executing commands
  • Made system prompts dynamic - they now exclude command-related instructions when the experiment is enabled
  • The feature is backward compatible with opt-in behavior (defaults to false)

Important: This is implemented as an experiment to test the behavior. If it works well, we will simply remove the command execution feature entirely from attempt_completion without adding it as a permanent setting.

Design choices:

  • Used the existing experiments system for feature flagging to ensure consistency
  • Implemented dynamic prompt generation to provide a seamless experience when the flag is toggled
  • Added comprehensive test coverage for both enabled and disabled states

Test Procedure

Unit Tests:

  1. Run npm test to execute all tests
  2. New test files added:
    • src/core/prompts/tools/__tests__/attempt-completion.experiment.test.ts (6 tests)
    • src/core/tools/__tests__/attemptCompletionTool.experiment.test.ts (7 tests)
  3. All existing tests updated to include the new experiment property

Manual Testing:

  1. Open VSCode settings and navigate to Experimental Features
  2. Enable "Disable command execution in attempt_completion"
  3. Create a task that would normally execute a command (e.g., "Create a simple HTML file and show it")
  4. Verify that:
    • With flag disabled (default): Commands execute as normal
    • With flag enabled: Commands are ignored, task completes without execution
    • System prompts don't mention command execution when flag is enabled

Verification:

  • The AI assistant should use execute_command separately when the flag is enabled
  • No breaking changes to existing workflows when flag is disabled

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Settings UI - New Experimental Toggle:
image
The new "Disable command execution in attempt_completion" toggle appears in the Experimental Features section

Behavior Comparison:

  • Flag Disabled (Default): Commands execute normally after task completion
  • Flag Enabled: Commands are ignored, requiring separate execute_command tool usage

Documentation Updates

  • No documentation updates are required.
  • Yes, documentation updates are required.

Note: Since this is an experimental feature that will be removed entirely if successful (not moved to settings), no permanent documentation is needed.

Additional Notes

Files Changed Summary:

  • 13 files modified
  • 513 insertions, 16 deletions
  • 2 new test files created
  • All tests passing

Impact Analysis:

  • No performance impact (simple conditional checks)
  • Fully backward compatible
  • Users can test their workflows before the feature is removed entirely

Future Plan:
If the experiment proves successful, we will remove the command execution capability from attempt_completion entirely, without moving it to a permanent setting. This ensures cleaner separation of concerns and better adherence to the step-by-step verification principle.


Important

Introduces an experimental flag to disable command execution in attempt_completion, with updates to code, tests, and localization.

  • Behavior:
    • Adds DISABLE_COMPLETION_COMMAND flag to disable command execution in attempt_completion.
    • Updates attemptCompletionTool.ts to respect the new flag, preventing command execution when enabled.
    • Dynamic prompt generation excludes command instructions when the flag is enabled.
  • Experiments:
    • Adds disableCompletionCommand to experiment.ts and related schemas.
  • Testing:
    • New tests in attempt-completion.experiment.test.ts and attemptCompletionTool.experiment.test.ts to cover both enabled and disabled states.
    • Updates existing tests to include the new experiment property.
  • Localization:
    • Updates localization files to include descriptions for the new experimental feature.

This description was created by Ellipsis for cdeef22. You can customize this summary. It will automatically update as commits are pushed.

@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jun 4, 2025
@hannesrudolph hannesrudolph marked this pull request as ready for review June 5, 2025 05:00
@hannesrudolph hannesrudolph requested review from cte and mrubens as code owners June 5, 2025 05:00
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 5, 2025
@hannesrudolph hannesrudolph moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Jun 5, 2025
@dosubot dosubot bot added the enhancement New feature or request label Jun 5, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

It might be a good idea to expose this setting to the API configuration to allow disabling the suggested command parameter on evals and integration testing.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 7, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Jun 7, 2025
@mrubens mrubens merged commit 67d238a into main Jun 8, 2025
26 checks passed
@mrubens mrubens deleted the attempt_completion_fix branch June 8, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Feature Request: Disable Command Execution in attempt_completion Tool

4 participants