Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jun 23, 2025

Description

Fixes #5027

image
image
image
image

Implements PR 1 of the solution for nomic-embed-code model compatibility with semantic code indexing. This PR adds a user-configurable search score threshold setting that allows users to control the minimum similarity score for semantic search results, replacing the previous hardcoded approach.

Changes Made

🔧 Backend Changes

  • packages/types/src/codebase-index.ts: Extended codebaseIndexConfigSchema with optional codebaseIndexSearchMinScore field (number, 0-1 range)
  • src/services/code-index/config-manager.ts: Updated currentSearchMinScore getter to implement priority system:
    • User setting > model-specific threshold > default fallback (0.4)
    • Added private searchMinScore field to track user preference

🎨 Frontend Changes

  • webview-ui/src/components/settings/CodeIndexSettings.tsx: Added intuitive slider interface:
    • Range slider for threshold configuration (0.0-1.0 range, 0.05 steps)
    • Real-time value display showing current threshold (e.g., "0.65")
    • Visual progress with gradient background indicating current position
    • Min/Max labels (0.0 and 1.0) on slider ends
    • Helper text explaining the impact of different values
    • Proper accessibility with data-testid for testing

🧪 Test Updates

  • webview-ui/src/components/settings/__tests__/CodeIndexSettings.spec.tsx: Updated tests to:
    • Support new slider component instead of text input
    • Test slider functionality and value changes
    • Include comprehensive slider-specific test cases
    • Verify proper field counts and UI behavior

How It Works

Users can now:

  1. Navigate to Settings → Code Index Settings → "Search Score Threshold"
  2. Use an intuitive slider interface to configure their preferred threshold (0.0-1.0)
  3. See real-time feedback with the current value displayed prominently
  4. Lower values (e.g., 0.15) return more results but may be less relevant
  5. Higher values (e.g., 0.65) return fewer but more precise matches
  6. The setting takes priority over hardcoded model-specific thresholds

Testing

  • All existing tests pass
  • Added comprehensive tests for new slider functionality
  • Manual testing completed:
    • Slider responds correctly to user input
    • Value display updates in real-time
    • Settings are properly saved and restored
    • Backward compatibility maintained

Verification of Acceptance Criteria

  • User-configurable setting: Users can now set their own minimum score threshold
  • Intuitive interface: Slider provides much better UX than text input
  • Real-time feedback: Current value is displayed prominently
  • Priority system: User setting overrides model-specific thresholds
  • Backward compatibility: Existing configurations continue to work
  • Type safety: Full TypeScript support with Zod schema validation

Future Impact

This implementation enables PR 2 which will add nomic-embed-code support for all providers with required query prefixes, completing the full solution for issue #5027 without requiring hardcoded model-specific thresholds.

Screenshots

The new slider interface provides intuitive control over search sensitivity:

  • Slider range: 0.0 to 1.0 with 0.05 step increments
  • Real-time value display showing exact threshold
  • Visual progress indication with gradient background
  • Clear labels and helpful description text

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Comments added for complex logic
  • All tests pass (461 passed | 1 skipped)
  • No breaking changes
  • Accessibility considerations addressed
  • TypeScript compilation successful
  • Linting passes without warnings
  • Backward compatibility maintained

Important

Introduces a user-configurable search score threshold for semantic search with a new slider interface and comprehensive testing.

  • Behavior:
    • Adds user-configurable search score threshold in codebaseIndexConfigSchema in codebase-index.ts.
    • Implements priority system in config-manager.ts: user setting > model-specific threshold > default (0.4).
  • Frontend:
    • Adds slider in CodeIndexSettings.tsx for threshold configuration (0.0-1.0 range, 0.05 steps).
    • Updates CodeIndexSettings.spec.tsx to test slider functionality and value changes.
  • Models:
    • Updates embeddingModels.ts to include scoreThreshold and queryPrefix for models.
  • Tests:
    • Adds tests in config-manager.spec.ts for priority system and edge cases.
    • Updates CodeIndexSettings.spec.tsx for slider component tests.
  • Misc:
    • Updates translations in multiple i18n files for new UI elements.

This description was created by Ellipsis for b62bb38. You can customize this summary. It will automatically update as commits are pushed.

Copilot AI review requested due to automatic review settings June 23, 2025 14:24
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners June 23, 2025 14:24
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request UI/UX UI/UX related or focused labels Jun 23, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a user-configurable search score threshold for semantic search by introducing a slider in the settings UI, extending model profiles and config schema, and wiring it through the backend config manager and embedding services.

  • Added new i18n keys and UI slider component for adjusting the minimum similarity score (0.0–1.0).
  • Extended EMBEDDING_MODEL_PROFILES with scoreThreshold and queryPrefix, plus helper getters.
  • Updated the config manager and schema to support user‐set thresholds with priority over model defaults, and applied prefixes in embedder implementations.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
webview-ui/src/i18n/locales/en/settings.json Added searchMinScoreLabel and searchMinScoreDescription entries
webview-ui/src/components/settings/CodeIndexSettings.tsx Introduced range slider UI for search score threshold
webview-ui/src/components/settings/tests/CodeIndexSettings.spec.tsx Updated tests to verify slider rendering, value display, and change handling
src/shared/embeddingModels.ts Added scoreThreshold and queryPrefix to profiles; new getters
src/services/code-index/config-manager.ts Implemented currentSearchMinScore logic with user, model, and default priority
src/services/code-index/interfaces/config.ts Made searchMinScore required in service config interface
src/services/code-index/embedders/openai.ts Applied model query prefix before embedding
src/services/code-index/embedders/openai-compatible.ts Applied model query prefix before embedding
src/services/code-index/embedders/ollama.ts Applied model query prefix before embedding
packages/types/src/codebase-index.ts Extended Zod schema with optional codebaseIndexSearchMinScore

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 23, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jun 23, 2025
@daniel-lxs daniel-lxs marked this pull request as draft June 23, 2025 15:07
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jun 23, 2025
@delve-auditor
Copy link

delve-auditor bot commented Jun 24, 2025

No security or compliance issues detected. Reviewed everything up to b62bb38.

Security Overview
  • 🔎 Scanned files: 46 changed file(s)
Detected Code Changes

The diff is too large to display a summary of code changes.

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

Copy link
Collaborator Author

@hannesrudolph hannesrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation adds configurable search score threshold to replace hardcoded 0.4 value. Priority system: user setting → model-specific threshold → default constant.

Technical issues found:

  • Query prefix concatenation doesn't validate token limits. Texts near MAX_ITEM_TOKENS will exceed limits after prefix addition.
  • Hindi translation at line 63 contains Arabic/Urdu characters (اپنی) mixed with Devanagari.
  • No test coverage for priority cascade logic in currentSearchMinScore getter.

Architecture considerations:

  • EMBEDDING_MODEL_PROFILES hardcodes model configs. Future model additions require code changes instead of configuration.
  • Query prefix logic doesn't check for existing prefixes, could result in double-prefixing on reprocessing.
  • Type inconsistency: searchMinScore marked required in interface but optional in Zod schema.

Implementation notes:

  • Slider correctly uses nullish coalescing (existing bot comments appear outdated).
  • nomic-embed-code prefix "Represent this query for searching relevant code: " - is this documented by Nomic?
  • All embedder implementations (OpenAI, Ollama, OpenAI-compatible) follow same prefix pattern.

Changes are focused and maintain backward compatibility. The 0.15 threshold for nomic-embed-code matches issue requirements.

@hannesrudolph hannesrudolph marked this pull request as ready for review June 24, 2025 22:51
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 24, 2025
@hannesrudolph hannesrudolph moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Jun 24, 2025
hannesrudolph and others added 7 commits June 27, 2025 17:05
- Import SEARCH_MIN_SCORE constant to avoid magic number duplication
- Replace logical OR (||) with nullish coalescing (??) for numeric defaults to properly handle 0 values
- Add aria-label attribute to slider for screen reader accessibility

Fixes issues identified by GitHub Copilot and Ellipsis bots in PR #5041
- Add token limit validation in OpenAI and Ollama embedders to prevent exceeding MAX_ITEM_TOKENS with query prefixes
- Add comprehensive test coverage for currentSearchMinScore getter priority system
- Ensure fallback to unprefixed text when token limit would be exceeded
- Add console warnings when falling back to unprefixed embeddings
- Add comprehensive test coverage for currentSearchMinScore getter with priority system
- Add token limit validation in OpenAI and Ollama embedders to prevent embedding failures
- Fix UI validation to distinguish between indexing vs search settings
- Remove duplicate search threshold slider in UI
- Fix TypeScript type definitions for optional searchMinScore property
- Move search score threshold setting to advanced configuration section
- Make advanced section collapsible with toggle button (collapsed by default)
- Position advanced section after action buttons for better UX flow
- Add similarity score display badges in search results (3-decimal precision)
- Include smooth CSS transitions and proper accessibility (aria-expanded, aria-controls)
- Update all locale files with 'Advanced Configuration' translation
- Update tests to handle collapsible behavior with comprehensive coverage

Addresses UX feedback from mrubens on PR #5041 to de-emphasize complex
settings for general users while keeping them accessible for advanced users.

# Conflicts:
#	webview-ui/src/components/chat/CodebaseSearchResult.tsx
@hannesrudolph hannesrudolph force-pushed the fix/issue-5027-score-threshold-setting branch from 3375283 to a9dfaea Compare June 27, 2025 23:12
@hannesrudolph hannesrudolph moved this from PR [Changes Requested] to PR [Needs Prelim Review] in Roo Code Roadmap Jun 27, 2025
hannesrudolph and others added 2 commits June 27, 2025 18:04
- Add missing query prefix implementation in OpenAI embedder
- Add token limit validation when adding query prefixes to all embedders
- Fix translation error in Spanish locale (Russian text replaced with Spanish)
- Ensure double-prefix prevention in all embedders
- Maintain consistent error handling across embedders
- Add double-prefix guard to OpenAI embedder to prevent duplicate query prefixes
- Update Advanced Configuration UI to match ModesView.tsx style with proper hover effects and aria attributes
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Jun 30, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mrubens mrubens merged commit a348a2a into main Jul 1, 2025
11 checks passed
@mrubens mrubens deleted the fix/issue-5027-score-threshold-setting branch July 1, 2025 18:41
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Jul 1, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files. UI/UX UI/UX related or focused