Skip to content

Conversation

@colinmollenhour
Copy link

@colinmollenhour colinmollenhour commented Sep 4, 2025

Related GitHub Issue

Closes: #6604

Roo Code Task Context (Optional)

https://app.roocode.com/share/b11fe4ae-11cc-4e1d-b944-c47551eebe35

Description

Roo Code has many possible files it can use to generate the system prompt. If any of these files are symlinks to the same file or just a complete copy, Roo will still add them and so you end up with a bloated system prompt from duplicates.

Example:

AGENTS.md
.roo/rules/general.md -> ../../AGENTS.md

Test Procedure

  1. Create both AGENTS.md and .roo/rules/general.md which are either duplicates or one is a symlink to the other.
  2. Open Mode Settings and click Preview System Prompt.
  3. Observe that there are two copies of the data in the prompt.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

image

Documentation Updates

Does this PR necessitate updates to user-facing documentation?

  • No documentation updates are required.
  • Yes, documentation updates are required. (Please describe what needs to be updated or link to a PR in the docs repository).

Subjectively, this wasn't documented before and there is no downside to de-duplicating the prompts so not sure it needs to be documented..

Additional Notes

A more advanced method of detecting large nearly duplicate files and warning the user might be useful for some. For me, just not duplicating my rather large AGENTS.md file is a huge win.

Get in Touch

colin3475


Important

This PR prevents duplicate files in the system prompt by using a FileTracker to check for duplicate content and resolved paths, with tests added for validation.

  • Behavior:
    • Prevents duplicate files in system prompt by checking MD5 hash and resolved paths in custom-instructions.ts.
    • Handles symlinks and identical content files to avoid duplicates.
  • Functions:
    • Adds FileTracker interface and shouldIncludeFile() function to track and check duplicates.
    • Modifies loadRuleFiles() and readTextFilesFromDirectory() to use FileTracker.
  • Tests:
    • Adds tests in custom-instructions.spec.ts for deduplication of files with identical content and symlinks.

This description was created by Ellipsis for a661d39. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Sep 4, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 4, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This PR successfully addresses the duplicate files issue in the system prompt. The implementation using a FileTracker with both MD5 content hashing and resolved path tracking is solid and well-tested. I've left some suggestions inline that could improve performance and maintainability.

@colinmollenhour
Copy link
Author

colinmollenhour commented Sep 4, 2025

Looks like the build fails because Node's "crypto" is not in the browser context. I'm not sure how to handle that without adding a dependency like https://www.npmjs.com/package/crypto-browserify - what's the preference?

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Sep 4, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Sep 4, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, regarding the crypto module issue you mentioned - the problem is that src/shared/modes.ts imports addCustomInstructions from custom-instructions.ts at the top level (line 14).

During the webview build, Vite analyzes all imports in shared modules and tries to bundle them. Since WebviewMessage.ts imports the Mode type from ./modes, Vite processes the entire shared/modes.ts file including all its imports. This pulls in custom-instructions.ts which uses Node.js-only modules (fs/promises and now crypto) that don't exist in browsers.

The issue existed before with fs/promises (which Vite was externalizing with warnings), but adding crypto makes it worse and causes actual build failures.

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Sep 16, 2025
@colinmollenhour
Copy link
Author

The issue existed before with fs/promises (which Vite was externalizing with warnings), but adding crypto makes it worse and causes actual build failures.

So should crypto be replaced with a pure JS implementation or do you have some other preference?

@daniel-lxs
Copy link
Member

I think the fix has two parts. For this PR, let's focus on the crypto issue:

  1. Regarding the shared/modes.ts static import: This is definitely something we need to address, but it's out of scope for this PR. We should remove the top-level import there and instead dynamically import custom-instructions inside getFullModeDetails, but only when cwd is provided. That way it only runs in the extension host and never gets bundled into the webview.

    Something like:

    const { addCustomInstructions } = await import("../core/prompts/sections/custom-instructions")
  2. For this PR, We can avoid using crypto entirely. We don't need a cryptographic hash like MD5 just for deduping content.

    We could either use the web-standard globalThis.crypto.subtle.digest("SHA-256", ...) (see MDN), which is async and works in both modern Node and the browser. Or, even simpler, just use a small non-cryptographic hash function like FNV-1a. It's synchronous, pure TS, and has no imports.

This approach means we won't need a polyfill like crypto-browserify.

Also, probably a good idea to update the test names from "MD5" to something more generic like "content hash" so we're not tied to a specific algorithm.

@colinmollenhour
Copy link
Author

Thanks for the suggestions @daniel-lxs , I removed crypto and added native FNV-1a as you proposed so there are no new dependencies.

@hannesrudolph hannesrudolph moved this from PR [Changes Requested] to PR [Needs Prelim Review] in Roo Code Roadmap Oct 20, 2025
@daniel-lxs
Copy link
Member

I don't think this PR will solve issue #6604 since the injection of the file comes from the CLI itself, outside our logic to load files.

I'll close this PR for now but let me know if something like this makes sense.

@daniel-lxs daniel-lxs closed this Oct 22, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Oct 22, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Notice: Claude Code Provider Injects CLAUDE.md

3 participants