Proposal: Implement a "Apply" Feature like Cursor's

### What specific problem does this solve?

The core problem is that **none of RooCode's three native tools for code modification (`search_and_replace`, `write_file`, `apply_diff`) offer a reliable, safe, and cost-effective solution for complex coding tasks.** The most promising of these tools, `apply_diff`, frequently fails in practice.

Here is a detailed breakdown of the problem from a user's perspective when trying to modify code:

1.  **The `search_and_replace` tool is too primitive.** It's useful for simple, single-line variable renames, but is completely inadequate for any meaningful multi-line code addition or refactoring.

2.  **The `write_file` tool is powerful, but dangerous and expensive.** While it doesn't require the LLM to generate a perfect diff, it introduces two major risks:
    * **Risk of Hallucination:** The LLM may unintentionally alter parts of the file that should not be touched, leading to subtle bugs or broken code.
    * **High Token Cost:** Rewriting the entire file consumes a large number of tokens, making it unfriendly to the user's budget, especially for large files.

3.  **The `apply_diff` tool, while theoretically ideal, is the most common point of failure.** This tool should be the perfect solution: it's precise, token-efficient, and capable of multi-line changes. However, its reliability depends entirely on the LLM's ability to generate a 100% accurate diff patch. In practice, this often fails because:
    * **Contextual Noise:** In a long conversation filled with other tasks and context, the LLM's focus is diluted. It often generates a diff block with subtle, almost invisible differences (like whitespace or a minor character mismatch) from the actual source code.
    * **Model Limitations:** The general-purpose models most users connect to are not specifically fine-tuned for high-fidelity diff generation.

This results in the `apply_diff` tool failing to find a perfect match, thus rejecting the patch. While a user can manually lower the match threshold, this is a poor workaround that compromises precision and is not a real solution.

**This feature proposal directly targets this critical weakness.** It aims to fix the unreliability of the `apply_diff` tool by creating a new workflow that ensures the LLM can generate a high-quality, accurate diff *every time*. It does this by separating the creative "code generation" task from the logical "diff generation" task, feeding the `apply_diff` tool what it needs to succeed.

### Additional context (optional)

The unreliability of the `apply_diff` tool is not just a theoretical concern but a well-documented, recurring problem that impacts many users. A brief search of existing issues reveals multiple reports where `apply_diff` fails because the LLM generates a slightly inaccurate patch that cannot be matched against the source code.

This proposal directly addresses the root cause of these failures. For example, the problems described in the following issues are symptomatic of this core weakness:

* **#2556, #2637, #1866, #1713**

By implementing a two-stage workflow, we can ensure a high-quality, accurate diff is always generated, which would fundamentally solve this entire class of problems instead of patching individual symptoms.

### Roo Code Task Links (Optional)

_No response_

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [ ] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)

The key to achieving highly reliable, Cursor-level code integration lies not in possessing a mythical, fine-tuned "Apply Model," but in adopting a superior architectural pattern. It is highly probable that Cursor's acclaimed "Apply" feature is itself powered by a general-purpose LLM, but one that is called within a dedicated, isolated process with a hyper-focused prompt.

This proposal outlines how RooCode can implement this exact same, industry-leading **"Generate & Integrate" design pattern.** The solution is to decouple the creative task of code generation from the logical task of code integration through a two-stage, orchestrated process within the agent's core logic.

### The Proposed Solution in Detail

**Stage 1: Creative Code Generation**
1.  **User Interaction:** The user makes a high-level coding request in the chat as usual.
2.  **Agent Action:** The RooCode agent makes a **first API call** to the user's selected LLM. The core prompt for this call is focused purely on **code generation**.
3.  **Model Output:** The model is instructed to return a simple, structured object containing three key pieces of information:
    * `file`: The target filename(s).
    * `type`: The nature of the code, e.g., `snippet` or `full_file` replacement.
    * `code`: The raw, generated code string.
4.  **Key Constraint:** At this stage, the model is **never** asked to generate a `diff` format. Its sole job is to "write the code."

**Stage 2: Structured Diff Integration**
1.  **Agent Action:** Upon receiving the output from Stage 1, the RooCode agent reads the full original content of the target file(s) from the workspace.
2.  **Initiating a Second Call:** The agent immediately makes a **new, independent second API call** to the LLM. This call uses a hardcoded, highly optimized prompt template specifically designed for diff generation.
3.  **Focused Context:** The input for this call is "clean": it only contains the **original file content** and the **new code from Stage 1**. Previous conversational history and "noise" are completely excluded.
    * *Example System Prompt:* "You are a diff generation expert. Given the original file content and the new code provided, generate a standard unified diff patch to integrate the new code into the original file in the most logical way."
4.  **Model Output:** Because the task is singular and the context is clean, the model can reliably return a high-quality, high-accuracy `diff` patch.
5.  **Final Presentation:** RooCode receives this high-quality diff and presents it to the user for final review and approval.

### How will users interact with it?

From the user's perspective, the core interaction flow **will barely change**, which is the elegance of this solution. The user still makes a request and still reviews a diff.

The real change is in the **quality of the experience**:
* **Increased Reliability:** Users will find that the success rate of `apply_diff` increases dramatically. Failures and the need for manual edits will be significantly reduced.
* **Enhanced Trust:** The agent will feel more "trustworthy," allowing users to delegate complex tasks with greater confidence.

This solution abstracts away the complex underlying implementation, leaving the user with a smoother, more reliable result.

### How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

**Scenario 1: Successfully adding a new code snippet**
* **Given** I have a `calculator.py` file with only an `add` function.
* **When** I instruct RooCode to "add a `subtract` function" to `calculator.py`.
* **Then** I should be presented with a diff preview that accurately adds the new `subtract` function to the file.
* **And** this diff patch should be applied successfully by the `apply_diff` tool without errors.
* **But** I should not see or perceive the two separate AI calls that happened in the background to achieve this.

**Scenario 2: Successfully refactoring an existing function**
* **Given** I have an `api.js` file written with Promise `.then()` syntax.
* **When** I request to "refactor all functions in this file to use async/await".
* **Then** I should be presented with a diff preview that correctly replaces all Promise-based functions with their async/await versions.
* **And** the resulting code should be functionally equivalent to the original.
* **But** the accuracy of the final diff should not be affected by previous "noisy" conversational context, such as corrections or small talk.

**Scenario 3: Handling a request that would have previously failed**
* **Given** I have a complex project structure and the conversational context with the AI has become long and confusing.
* **When** I provide a slightly ambiguous but clear-in-intent modification instruction.
* **Then** the new workflow should produce a cleaner and more accurate diff than the old single-step method would have.
* **And** the success rate of the `apply_diff` tool applying this diff should be significantly higher than before.

### Technical considerations (REQUIRED if contributing, optional otherwise)

May require some refactoring

### Trade-offs and risks (REQUIRED if contributing, optional otherwise)

### Alternative approaches and why you chose this one

1.  **Alternative 1: Maintain the Status Quo (Single-Step LLM Call)**
    * **Pros:** Faster, cheaper, simpler existing architecture.
    * **Cons:** This is the core problem this proposal aims to solve. As detailed in the problem description, its unreliability makes the `apply_diff` tool ineffective for complex tasks.

2.  **Alternative 2: Advanced Single-Prompt Engineering (a "God Prompt")**
    * **Pros:** Requires no architectural changes.
    * **Cons:** Attempting to design a single prompt that reliably guides any LLM to produce a perfect diff in all complex scenarios is impractical and brittle. It fails to address the root issue of cognitive overload on the model and is highly sensitive to model/version changes.

3.  **Alternative 3: Rely on an External MCP Tool**
    * **Pros:** Aligns with a highly modular philosophy.
    * **Cons:** No public, standalone MCP tool for this specific purpose currently exists. This would introduce an external dependency on a hypothetical tool.

**Why this approach was chosen:** The proposed two-stage workflow strikes the best balance between reliability, practicality, and implementation effort. It is more robust than prompt engineering alone and more practical than waiting for a non-existent external tool. It fundamentally improves the reliability of the core `apply_diff` tool by using existing LLMs in a smarter, more focused way.

### Potential negative impacts (performance, UX, etc.)

* **Performance / Latency:** This is the primary trade-off. Two sequential API calls will inherently take longer than a single call.
* **API Cost:** The cost impact is nuanced, not a simple increase. This architecture opens a path for **cost optimization** by allowing the use of a cheaper, logic-focused model for the second (integration) stage. A successful two-stage call can often be more economical than a failed single-stage call that needs to be re-run. While the overhead of two calls might lead to a minor cost increase for the simplest tasks, the primary trade-off shifts from "cost" to **"cost-control,"** giving users the flexibility to balance model power and expense for each stage of the task.
* **User Experience (UX):** If the increased latency is not managed with clear UI feedback (e.g., status updates), it could make the application feel sluggish.
* **Code Complexity:** The agent's internal logic becomes more complex, increasing the maintenance burden and the potential for new bugs in the orchestration logic.

### Breaking changes or migration concerns

**None.** To avoid any disruption for existing users, this feature should be introduced as a **new, optional configuration mode**. This approach requires no data migration.

### Edge cases that need careful handling

* **Ambiguous Snippet Placement:** If the code from Stage 1 could logically fit in multiple locations, the context-less Stage 2 LLM might guess the wrong placement. **Mitigation:** The Stage 1 prompt could be engineered to request an "anchor" in its output (e.g., "insert after function `foo()`") to guide the integration step.
* **Stage 2 Failure:** If the diff generation itself fails, the system needs a graceful fallback. It should report a specific error (e.g., "Code was generated, but failed to create a patch. Please apply manually.") rather than a generic failure.
* **Very Large Files:** Sending an entire large file as context in Stage 2 could hit context window limits or become very expensive. The system may need a strategy for handling very large files, perhaps by only sending a relevant "slice" of the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Implement a "Apply" Feature like Cursor's #6159

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

The Proposed Solution in Detail

How will users interact with it?

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Alternative approaches and why you chose this one

Potential negative impacts (performance, UX, etc.)

Breaking changes or migration concerns

Edge cases that need careful handling

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Implement a "Apply" Feature like Cursor's #6159

Description

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

The Proposed Solution in Detail

How will users interact with it?

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Alternative approaches and why you chose this one

Potential negative impacts (performance, UX, etc.)

Breaking changes or migration concerns

Edge cases that need careful handling

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions