Skip to content

Refactor: Improve file tool context formatting and diff error guidance#2278

Merged
mrubens merged 6 commits intomainfrom
apply_diif_fix_1
Apr 4, 2025
Merged

Refactor: Improve file tool context formatting and diff error guidance#2278
mrubens merged 6 commits intomainfrom
apply_diif_fix_1

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Apr 3, 2025

This PR includes two related improvements to tool interactions and context formatting:

  1. Refactor read_file Tool Result Formatting (src/core/Cline.ts):

    • Modifies how results from the read_file tool are formatted when added to the AI model's conversation history (user context).
    • Previous Format: [read_file for 'path/to/file'] Result:\n{content}
    • New Format:
      <file>
        <path>path/to/file</path>
        <content>
      {content_or_error}
        </content>
      </file>
    • This change only affects read_file results; other tool formats are unchanged.
  2. Improve apply_diff Error Message Clarity (src/core/diff/strategies/multi-search-replace.ts - Commit 73da4d03):

    • Updates the error message returned by the multi-search-replace diff strategy when no sufficiently similar match is found.
    • The "Tip" within the error message now more clearly instructs the user (or AI) to use the read_file tool to get the latest file content before retrying the apply_diff tool, explicitly mentioning both tools.

Reasoning and Benefits:

  • Structured File Context: The XML formatting for read_file results provides a clearer, more structured representation of file path and content within the AI's context. This may improve the model's ability to parse and utilize file information reliably.
  • Enhanced Error Guidance: The updated error message in the diff strategy provides more explicit guidance when a diff fails due to potential content mismatches. By specifically mentioning both read_file and apply_diff, it helps steer the AI towards the correct recovery action (refreshing its knowledge of the file before trying the diff again).
  • Improved Robustness: Together, these changes aim to make interactions involving file reading and modification more robust by improving context clarity and providing better error recovery instructions.

Important

Refactor read_file tool to use XML formatting and improve apply_diff error guidance for better tool interaction.

  • Behavior:
    • Refactor read_file tool result formatting in Cline.ts to XML format for structured representation.
    • Update apply_diff error message in multi-search-replace.ts to guide using read_file before retrying.
  • Tests:
    • Update read-file-maxReadFileLine.test.ts to expect XML formatted results.
  • Misc:
    • Modify readFileTool.ts to push XML formatted results.

This description was created by Ellipsis for 8cd6c20. It will automatically update as commits are pushed.

Modifies how the result of the `read_file` tool is presented
in the conversation history sent to the AI model.

Previously, the format was:
[read_file for 'path/to/file'] Result:
{content_or_error}

This commit changes the format to use XML tags for better
structure and potentially easier parsing by the model:
<file>
  <path>path/to/file</path>
  <content>
{content_or_error}
  </content>
</file>

This change only affects the `read_file` tool result formatting
within the user context message constructed in `src/core/Cline.ts`.
Other tool result formats remain unchanged.
…strategy

Refines the error message returned when no sufficiently similar match is found during the multi-search-replace operation. The message now includes a clearer instruction to use the read_file tool for obtaining the latest file content before attempting to apply the diff again.
@changeset-bot
Copy link

changeset-bot bot commented Apr 3, 2025

⚠️ No Changeset found

Latest commit: 8cd6c20

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@hannesrudolph hannesrudolph changed the title Apply diif fix 1 fix(multi-search-replace): Improve error message & refactor(read_file): Change result format to XML Apr 3, 2025
@hannesrudolph hannesrudolph changed the title fix(multi-search-replace): Improve error message & refactor(read_file): Change result format to XML Refactor: Improve file tool context formatting and diff error guidance Apr 3, 2025
@hannesrudolph hannesrudolph marked this pull request as ready for review April 3, 2025 23:24
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Enhancement New feature or request labels Apr 3, 2025
@mrubens
Copy link
Collaborator

mrubens commented Apr 4, 2025

I think we could just make this change by adjusting what we pass into pushToolResult in readFileTool.ts

@hannesrudolph
Copy link
Collaborator Author

Ok get it done! :) :P

Modifies the `readFileTool` function to format the output as XML, enhancing the structure of the returned file content. This change aligns with previous updates to ensure consistent result formatting across tools.
This update refines the handling of tool responses in the `Cline` class by removing the XML formatting for `read_file` results and consolidating the logic for pushing results to the user message content. The changes ensure that all tool results are processed uniformly, improving code clarity and maintainability.
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Apr 4, 2025
This commit modifies the assertions in the `read_file` tool tests to check for the expected XML format in the results. The changes ensure that the output structure aligns with recent updates to the tool's response formatting, enhancing test accuracy and reliability.
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 4, 2025
…able

This commit updates the `read_file` tool tests to utilize a predefined variable for the expected XML output, improving readability and maintainability of the test code. The changes ensure consistency in the expected results across multiple test cases.
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 4, 2025
@mrubens mrubens merged commit 951cefc into main Apr 4, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Apr 4, 2025
@mrubens mrubens deleted the apply_diif_fix_1 branch April 4, 2025 22:26
@KJ7LNW
Copy link
Contributor

KJ7LNW commented Apr 5, 2025

IMHO, this was merged prematurely. I have reiterated the most important items (from the review above) that should be addressed before merging:

1. program tail output is incorrect because of unnecessary XML indentation:

Indentation is pretty for humans, but you will confuse the model it will think that \n that precedes </content> is actually part of the file.

I think this is better:

const xmlResult = `<file><path>${filePath}</path><content>\n${fileContent}</content></file>`

2. Errors are shown as file content which could confuse the model

We do not want the model to think that the file contains the error text, but the structured markup does imply that as it is currently implemented.

<content>\n${fileContent}</content> should only contain actual file content. if you look back a few lines there are other messages that should be managed as separate optional XML tags:

  content += `\n\n[Showing only ${maxReadFileLine} of ${totalLines} total lines. Use start_line and end_line if you need to read more]${sourceCodeDef}`

should become

  xmlInfo += `<notice>Showing only ${maxReadFileLine} of ${totalLines} total lines. Use start_line and end_line if you need to read more</notice>
  xmlInfo +=  `<list_code_definition_names>${sourceCodeDef}</list_code_definition_names>`

do the same thing with any errors as

xmlInfo += `<error>foo</error>`

and then final result becomes:

const xmlResult = `<file><path>${filePath}</path><content>\n${fileContent}</content>${xmlInfo}</file>`

this gives us a fully structured response that we can use as a new standard.

@mrubens
Copy link
Collaborator

mrubens commented Apr 5, 2025

Yeah that's good feedback. I like your suggestions.

@KJ7LNW
Copy link
Contributor

KJ7LNW commented Apr 5, 2025

I am going to make a pr for this stay tuned ...

mrubens pushed a commit that referenced this pull request Apr 9, 2025
* fix: addLineNumbers handling of empty content

Empty files should not have line numbers, but non-empty files with empty content at a specific line offset should.
- If content is empty, return empty string for empty files
- If content is empty but startLine > 1, return line number for empty content at that offset
This ensures that the model does not think the file contains a single empty line.

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* refactor: improve readFileTool XML output format

- Remove unnecessary XML indentation that could confuse the model
- Separate file content from notices and errors using dedicated tags
- Add line range information to content tags
- Handle empty files properly with self-closing tags
- Add comprehensive test coverage

Fixes #2278

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* fix: always show line numbers in read_file XML output

- Always display line numbers in non-range reads
- Improve XML formatting with consistent newlines for better readability

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* test: update tests to match new XML format with line numbers

- Update test expectations to match the new XML format with newlines
- Update tests to expect line numbers attribute in content tags
- Modify test assertions to check for the correct line range values

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* fix: consistent blank line handling in addLineNumbers

- Add newline to all output
- Handle trailing newlines and empty lines consistently
- Add test cases for blank lines:
  - Multiple blank lines within content
  - Multiple trailing blank lines
  - Only blank lines with offset
  - Trailing newlines

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* test: use actual addLineNumbers in read-file-xml tests

- Modified extract-text mock to preserve actual addLineNumbers implementation
- Removed mock implementation of addLineNumbers
- Updated test data to account for trailing newline
- Removed unnecessary mock verification

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* test: ensure actual addLineNumbers function is called in tests

- Replace direct mocking of addLineNumbers with spy on actual implementation
- Add verification to ensure the real function is called when appropriate
- Add skipAddLineNumbersCheck option for cases where function should not be called
- Update test cases to use appropriate verification options
- Fix numberedFileContent to include trailing newline for consistency

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* fix: modify readLines to process data directly instead of line by line

- Direct data processing provides more accurate results by preserving exact content with carriage returns
- Improved performance through minimal buffering and efficient string operations
- Use string indexes to find newlines while maintaining their original format
- Handle all edge cases correctly with preserved line endings
- Add tests for various edge cases including empty files, single lines, and different line endings

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

* test: remove unused mockInputContent variable

Remove unused variable declaration to appease ellipsis-dev linter requirements.

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>

---------

Signed-off-by: Eric Wheeler <roo-code@z.ewheeler.org>
Co-authored-by: Eric Wheeler <roo-code@z.ewheeler.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants