Skip to content

Proposal: Implement a "Apply" Feature like Cursor's #6159

@CberYellowstone

Description

@CberYellowstone

What specific problem does this solve?

The core problem is that none of RooCode's three native tools for code modification (search_and_replace, write_file, apply_diff) offer a reliable, safe, and cost-effective solution for complex coding tasks. The most promising of these tools, apply_diff, frequently fails in practice.

Here is a detailed breakdown of the problem from a user's perspective when trying to modify code:

  1. The search_and_replace tool is too primitive. It's useful for simple, single-line variable renames, but is completely inadequate for any meaningful multi-line code addition or refactoring.

  2. The write_file tool is powerful, but dangerous and expensive. While it doesn't require the LLM to generate a perfect diff, it introduces two major risks:

    • Risk of Hallucination: The LLM may unintentionally alter parts of the file that should not be touched, leading to subtle bugs or broken code.
    • High Token Cost: Rewriting the entire file consumes a large number of tokens, making it unfriendly to the user's budget, especially for large files.
  3. The apply_diff tool, while theoretically ideal, is the most common point of failure. This tool should be the perfect solution: it's precise, token-efficient, and capable of multi-line changes. However, its reliability depends entirely on the LLM's ability to generate a 100% accurate diff patch. In practice, this often fails because:

    • Contextual Noise: In a long conversation filled with other tasks and context, the LLM's focus is diluted. It often generates a diff block with subtle, almost invisible differences (like whitespace or a minor character mismatch) from the actual source code.
    • Model Limitations: The general-purpose models most users connect to are not specifically fine-tuned for high-fidelity diff generation.

This results in the apply_diff tool failing to find a perfect match, thus rejecting the patch. While a user can manually lower the match threshold, this is a poor workaround that compromises precision and is not a real solution.

This feature proposal directly targets this critical weakness. It aims to fix the unreliability of the apply_diff tool by creating a new workflow that ensures the LLM can generate a high-quality, accurate diff every time. It does this by separating the creative "code generation" task from the logical "diff generation" task, feeding the apply_diff tool what it needs to succeed.

Additional context (optional)

The unreliability of the apply_diff tool is not just a theoretical concern but a well-documented, recurring problem that impacts many users. A brief search of existing issues reveals multiple reports where apply_diff fails because the LLM generates a slightly inaccurate patch that cannot be matched against the source code.

This proposal directly addresses the root cause of these failures. For example, the problems described in the following issues are symptomatic of this core weakness:

By implementing a two-stage workflow, we can ensure a high-quality, accurate diff is always generated, which would fundamentally solve this entire class of problems instead of patching individual symptoms.

Roo Code Task Links (Optional)

No response

Request checklist

  • I've searched existing Issues and Discussions for duplicates
  • This describes a specific problem with clear impact and context

Interested in implementing this?

  • Yes, I'd like to help implement this feature

Implementation requirements

  • I understand this needs approval before implementation begins

How should this be solved? (REQUIRED if contributing, optional otherwise)

The key to achieving highly reliable, Cursor-level code integration lies not in possessing a mythical, fine-tuned "Apply Model," but in adopting a superior architectural pattern. It is highly probable that Cursor's acclaimed "Apply" feature is itself powered by a general-purpose LLM, but one that is called within a dedicated, isolated process with a hyper-focused prompt.

This proposal outlines how RooCode can implement this exact same, industry-leading "Generate & Integrate" design pattern. The solution is to decouple the creative task of code generation from the logical task of code integration through a two-stage, orchestrated process within the agent's core logic.

The Proposed Solution in Detail

Stage 1: Creative Code Generation

  1. User Interaction: The user makes a high-level coding request in the chat as usual.
  2. Agent Action: The RooCode agent makes a first API call to the user's selected LLM. The core prompt for this call is focused purely on code generation.
  3. Model Output: The model is instructed to return a simple, structured object containing three key pieces of information:
    • file: The target filename(s).
    • type: The nature of the code, e.g., snippet or full_file replacement.
    • code: The raw, generated code string.
  4. Key Constraint: At this stage, the model is never asked to generate a diff format. Its sole job is to "write the code."

Stage 2: Structured Diff Integration

  1. Agent Action: Upon receiving the output from Stage 1, the RooCode agent reads the full original content of the target file(s) from the workspace.
  2. Initiating a Second Call: The agent immediately makes a new, independent second API call to the LLM. This call uses a hardcoded, highly optimized prompt template specifically designed for diff generation.
  3. Focused Context: The input for this call is "clean": it only contains the original file content and the new code from Stage 1. Previous conversational history and "noise" are completely excluded.
    • Example System Prompt: "You are a diff generation expert. Given the original file content and the new code provided, generate a standard unified diff patch to integrate the new code into the original file in the most logical way."
  4. Model Output: Because the task is singular and the context is clean, the model can reliably return a high-quality, high-accuracy diff patch.
  5. Final Presentation: RooCode receives this high-quality diff and presents it to the user for final review and approval.

How will users interact with it?

From the user's perspective, the core interaction flow will barely change, which is the elegance of this solution. The user still makes a request and still reviews a diff.

The real change is in the quality of the experience:

  • Increased Reliability: Users will find that the success rate of apply_diff increases dramatically. Failures and the need for manual edits will be significantly reduced.
  • Enhanced Trust: The agent will feel more "trustworthy," allowing users to delegate complex tasks with greater confidence.

This solution abstracts away the complex underlying implementation, leaving the user with a smoother, more reliable result.

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Scenario 1: Successfully adding a new code snippet

  • Given I have a calculator.py file with only an add function.
  • When I instruct RooCode to "add a subtract function" to calculator.py.
  • Then I should be presented with a diff preview that accurately adds the new subtract function to the file.
  • And this diff patch should be applied successfully by the apply_diff tool without errors.
  • But I should not see or perceive the two separate AI calls that happened in the background to achieve this.

Scenario 2: Successfully refactoring an existing function

  • Given I have an api.js file written with Promise .then() syntax.
  • When I request to "refactor all functions in this file to use async/await".
  • Then I should be presented with a diff preview that correctly replaces all Promise-based functions with their async/await versions.
  • And the resulting code should be functionally equivalent to the original.
  • But the accuracy of the final diff should not be affected by previous "noisy" conversational context, such as corrections or small talk.

Scenario 3: Handling a request that would have previously failed

  • Given I have a complex project structure and the conversational context with the AI has become long and confusing.
  • When I provide a slightly ambiguous but clear-in-intent modification instruction.
  • Then the new workflow should produce a cleaner and more accurate diff than the old single-step method would have.
  • And the success rate of the apply_diff tool applying this diff should be significantly higher than before.

Technical considerations (REQUIRED if contributing, optional otherwise)

May require some refactoring

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Alternative approaches and why you chose this one

  1. Alternative 1: Maintain the Status Quo (Single-Step LLM Call)

    • Pros: Faster, cheaper, simpler existing architecture.
    • Cons: This is the core problem this proposal aims to solve. As detailed in the problem description, its unreliability makes the apply_diff tool ineffective for complex tasks.
  2. Alternative 2: Advanced Single-Prompt Engineering (a "God Prompt")

    • Pros: Requires no architectural changes.
    • Cons: Attempting to design a single prompt that reliably guides any LLM to produce a perfect diff in all complex scenarios is impractical and brittle. It fails to address the root issue of cognitive overload on the model and is highly sensitive to model/version changes.
  3. Alternative 3: Rely on an External MCP Tool

    • Pros: Aligns with a highly modular philosophy.
    • Cons: No public, standalone MCP tool for this specific purpose currently exists. This would introduce an external dependency on a hypothetical tool.

Why this approach was chosen: The proposed two-stage workflow strikes the best balance between reliability, practicality, and implementation effort. It is more robust than prompt engineering alone and more practical than waiting for a non-existent external tool. It fundamentally improves the reliability of the core apply_diff tool by using existing LLMs in a smarter, more focused way.

Potential negative impacts (performance, UX, etc.)

  • Performance / Latency: This is the primary trade-off. Two sequential API calls will inherently take longer than a single call.
  • API Cost: The cost impact is nuanced, not a simple increase. This architecture opens a path for cost optimization by allowing the use of a cheaper, logic-focused model for the second (integration) stage. A successful two-stage call can often be more economical than a failed single-stage call that needs to be re-run. While the overhead of two calls might lead to a minor cost increase for the simplest tasks, the primary trade-off shifts from "cost" to "cost-control," giving users the flexibility to balance model power and expense for each stage of the task.
  • User Experience (UX): If the increased latency is not managed with clear UI feedback (e.g., status updates), it could make the application feel sluggish.
  • Code Complexity: The agent's internal logic becomes more complex, increasing the maintenance burden and the potential for new bugs in the orchestration logic.

Breaking changes or migration concerns

None. To avoid any disruption for existing users, this feature should be introduced as a new, optional configuration mode. This approach requires no data migration.

Edge cases that need careful handling

  • Ambiguous Snippet Placement: If the code from Stage 1 could logically fit in multiple locations, the context-less Stage 2 LLM might guess the wrong placement. Mitigation: The Stage 1 prompt could be engineered to request an "anchor" in its output (e.g., "insert after function foo()") to guide the integration step.
  • Stage 2 Failure: If the diff generation itself fails, the system needs a graceful fallback. It should report a specific error (e.g., "Code was generated, but failed to create a patch. Please apply manually.") rather than a generic failure.
  • Very Large Files: Sending an entire large file as context in Stage 2 could hit context window limits or become very expensive. The system may need a strategy for handling very large files, perhaps by only sending a relevant "slice" of the code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestIssue - Needs ScopingValid, but needs effort estimate or design input before work can start.feature requestFeature request, not a bugproposal

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions