Skip to content

Conversation

@roomote
Copy link
Collaborator

@roomote roomote commented Jun 18, 2025

Summary

This PR fixes issue #4852 where would hang indefinitely when processing large or complex XML files.

Root Cause

The issue was in the function used by both diff strategies. The middle-out search algorithm could potentially run millions of iterations on large files without any performance constraints, leading to indefinite hanging.

Solution

Added comprehensive performance safeguards to the function:

  1. Maximum iteration limit: Capped at 10,000 iterations to prevent infinite loops
  2. Timeout mechanism: 5-second timeout for very large files
  3. Early exit optimization: Immediately exit when perfect matches (100% similarity) are found
  4. Performance logging: Added warnings for debugging large file issues

Changes Made

  • Modified function in
  • Modified function in
  • Added comprehensive test suite in

Testing

  • Added performance tests that verify the fix works on large XML files (1000+ lines)
  • Tests confirm operations complete within reasonable time limits (< 10 seconds)
  • All existing tests continue to pass
  • Verified both successful matches and failed searches are handled correctly

Impact

  • Resolves hanging issues on large XML files
  • Maintains existing functionality for normal-sized files
  • Improves overall performance through early exit optimization
  • Provides better debugging information for performance issues

Fixes #4852

… on large XML files

- Added maximum iteration limit (10,000) to fuzzySearch function
- Added 5-second timeout mechanism for very large files
- Added early exit when perfect matches are found
- Added performance logging for debugging large file issues
- Applied fixes to both MultiSearchReplaceDiffStrategy and MultiFileSearchReplaceDiffStrategy
- Added comprehensive test suite to verify the fix works correctly

The issue was caused by the middle-out search algorithm in fuzzySearch()
potentially running millions of iterations on large files without any
performance constraints, leading to indefinite hanging.
@roomote roomote requested review from cte, jr and mrubens as code owners June 18, 2025 19:19
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jun 18, 2025
@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Jun 18, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at [email protected]


Generated with ❤️ by ellipsis.dev

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 18, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 19, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jun 19, 2025
@daniel-lxs
Copy link
Member

Closing temporarily in favor of #4854.

This PR addresses the findBestMatch performance but doesn't fix the actual root cause. The issue analysis in #4851 identified that the hang occurs during XML parsing and HTML entity processing in the apply_diff tool itself, not in the fuzzy search.

PR #4854 directly addresses these root causes:

  • XML parsing timeouts that were causing indefinite hangs
  • HTML entity double-processing corrupting XML content
  • Regex operations hanging on complex XML patterns

While the 10k iteration limit would help with search performance, it doesn't solve the actual hanging issue users are experiencing with XML files.

@daniel-lxs daniel-lxs closed this Jun 19, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 19, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Jun 19, 2025
@roomote roomote deleted the fix-4852 branch June 19, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Bug: apply_diff errors out on large or complex XML files

5 participants