Skip to content

Conversation

@ChuKhaLi
Copy link
Contributor

@ChuKhaLi ChuKhaLi commented May 28, 2025

…ment

Related GitHub Issue

Closes: #3987

Description

Update codebase search description to emphasize English query requirement

Test Procedure

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch


Important

Updates getCodebaseSearchDescription() to require English search queries for accurate semantic matching.

  • Behavior:
    • Updates getCodebaseSearchDescription() in codebase-search.ts to emphasize that search queries must be in English for accurate semantic matching.
    • If a user's query is in another language, it must be translated to English before searching.

This description was created by Ellipsis for e6b5718. You can customize this summary. It will automatically update as commits are pushed.

@ChuKhaLi ChuKhaLi requested review from cte and mrubens as code owners May 28, 2025 20:25
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 28, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Preliminary Review] in Roo Code Roadmap May 28, 2025
@daniel-lxs
Copy link
Member

Hey @ChuKhaLi, That was very quick! Thank you!

I made a tiny change to reduce redundant mentions of the word "English".

This seems to help keep the queries exclusively in English according to my testing.

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap May 28, 2025
@mrubens
Copy link
Collaborator

mrubens commented May 29, 2025

I haven't been following closely - why do the queries need to be in English? Does it depend on the embedding model used? My quick research seems to indicate that the OpenAI embedding models at least are multilingual.

@daniel-lxs
Copy link
Member

daniel-lxs commented May 29, 2025

@mrubens
So based on my research, while it is possible to search on other languages to get results from a codebase in English, it might reduce the quality of the results. In the issue there's an scenario where no relevant results are returned.

It is a good idea however to check for alternative ways to deal with this, while this specific fix works, it might cause unexpected side-effects.

Let me look into this some more.

@daniel-lxs daniel-lxs moved this from PR [Needs Review] to PR [Changes Requested] in Roo Code Roadmap May 29, 2025
@daniel-lxs
Copy link
Member

daniel-lxs commented May 29, 2025

@mrubens Here are my results trying the same query on different languages:


I have successfully tested the codebase_search tool by searching for "write to file tool" in four different languages as requested:

  1. Spanish ("herramienta de escritura de archivo"):

    • Found 50 results with scores ranging from 0.50 to 0.41
    • Results included Spanish translations in prompts.json and settings.json
    • Also found references to the actual write_to_file tool implementation
  2. French ("outil d'écriture de fichier"):

    • Found 50 results with scores ranging from 0.51 to 0.40
    • Similar pattern with French translations and tool implementations
    • Good coverage of UI translations and settings
  3. Korean ("파일 쓰기 도구"):

    • Found 50 results with scores ranging from 0.50 to 0.40
    • Found Korean translations and UI components
    • Included file operation related translations
  4. English ("write to file tool"):

    • Found 50 results with the highest scores (0.60)
    • Most comprehensive results focusing on actual implementation
    • Included core tool definitions, documentation, tests, and usage examples

The codebase_search tool demonstrated excellent multilingual search capabilities, successfully finding relevant results in all four languages. The English search naturally returned the most technical implementation details, while the other languages focused more on UI translations and localized content. This confirms that the semantic search functionality works effectively across different languages.


It seems like other languages are supported, however the result quality takes a big hit, specially on codebases that contain files in the language of the query, like our translations.

My conclusion
The change from this PR will improve the results from the codebase_search tool however it might have side effects like making the model switch languages. However this can easily be added as a custom rule if the user is aware of the risks and is ok with them.

@mrubens
Copy link
Collaborator

mrubens commented May 29, 2025

Great, let’s do it then. Thanks for digging in!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 29, 2025
@mrubens mrubens merged commit c03b9f9 into RooCodeInc:main May 29, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from PR [Changes Requested] to Done in Roo Code Roadmap May 29, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Does the code indexing feature support multiple languages?

4 participants