Skip to content

Add Documentation for Experimental Codebase Indexing Feature#204

Merged
hannesrudolph merged 3 commits intomainfrom
indexing-exp
May 24, 2025
Merged

Add Documentation for Experimental Codebase Indexing Feature#204
hannesrudolph merged 3 commits intomainfrom
indexing-exp

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented May 24, 2025

Add Documentation for Experimental Codebase Indexing Feature

Overview

This PR adds complete documentation for the new experimental Codebase Indexing feature that enables semantic code search using AI embeddings and vector similarity.

Changes

  • New Feature Documentation: docs/features/experimental/codebase-indexing.mdx covering setup, configuration, and usage with proper tool cross-references
  • New Tool Documentation: docs/advanced-usage/available-tools/codebase-search.md with parameters, examples, and best practices
  • Navigation Updates: Added to experimental features and available tools sections
  • FAQ Updates: Added entries about codebase indexing and costs
  • Tool Reorganization: Split read/search tools into separate categories for better organization

Key Features Documented

  • Semantic code search using OpenAI/Ollama embeddings + Qdrant vector database
  • Tree-sitter code parsing with smart file filtering
  • Natural language queries like "user authentication logic" or "database connection handling"
  • Setup instructions for Docker, cloud deployment, and security considerations
  • Performance characteristics, limitations, and cost estimates
  • Proper cross-linking between feature documentation and codebase_search tool reference

The documentation enables users to effectively set up and use AI-powered semantic code search while clearly marking the experimental nature of the feature.


Important

Adds documentation for the experimental Codebase Indexing feature, including setup, usage, and tool integration.

  • Documentation:
    • Adds codebase-indexing.mdx for Codebase Indexing feature, detailing setup, configuration, and usage.
    • Adds codebase-search.md for the codebase_search tool, explaining parameters, functionality, and best practices.
  • Navigation:
    • Updates sidebars to include new documentation under Experimental and Available Tools sections.
  • FAQ:
    • Adds entries about Codebase Indexing and its costs.
  • Tool Reorganization:
    • Splits read/search tools into separate categories in tool-use-overview.md.

This description was created by Ellipsis for c0d22a8. You can customize this summary. It will automatically update as commits are pushed.

… feature that enables semantic code search using AI embeddings.

## New Documentation
- **docs/features/experimental/codebase-indexing.mdx**: Comprehensive feature documentation covering:
  - Semantic search capabilities using Tree-sitter parsing and AI embeddings
  - Setup requirements for OpenAI/Ollama embedding providers and Qdrant vector database
  - Configuration steps and status indicators
  - File processing with smart code parsing and automatic filtering
  - Best practices for model selection and security considerations
  - Current limitations and future enhancements
  - Privacy and security considerations

## Updated Navigation & Cross-references
- **sidebars.ts**: Added Codebase Indexing to Features > Experimental navigation menu
- **docs/features/experimental/experimental-features.md**:
  - Added Codebase Indexing to experimental features list
  - Added screenshot showing the experimental features settings panel
- **docs/faq.md**: Added FAQ entries explaining:
  - What Codebase Indexing is and its semantic search capabilities
  - Cost considerations for embedding generation and vector storage

## Assets
- **static/img/experimental-features/experimental-features.png**: Screenshot of experimental features settings panel

## Technical Details Covered
- Tree-sitter integration for AST-based code parsing
- Support for both OpenAI and Ollama embedding providers
- Qdrant vector database integration with local and cloud deployment options
- Incremental indexing with file watching and hash-based caching
- Smart file filtering excluding binaries, large files, and common ignore patterns
- codebase_search tool integration for AI-powered code discovery

The documentation is targeted at a semi-technical audience and provides practical setup guidance while explaining the underlying semantic search technology.
@vercel
Copy link

vercel bot commented May 24, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
roo-code-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 24, 2025 7:14pm

@mrubens
Copy link
Collaborator

mrubens commented May 24, 2025

Content looks good. Only nitpick is that the UI looks slightly different now.

…tool and reorganize the tool category structure to better distinguish between read and search functionality.

## New Documentation
- **docs/advanced-usage/available-tools/codebase-search.md**: Complete tool documentation covering:
  - Semantic search capabilities using AI embeddings and vector similarity
  - Integration with experimental Codebase Indexing feature with proper warning
  - Parameters, requirements, and configuration dependencies (OpenAI/Ollama + Qdrant)
  - Detailed workflow explanation from query processing to result formatting
  - Best practices for effective semantic queries vs. traditional text search
  - Directory scoping capabilities and result interpretation guidelines
  - Usage examples demonstrating authentication, database, error handling, and testing searches
  - Similarity scoring explanation and result structure details

## Updated Navigation & Organization
- **sidebars.ts**: Added codebase_search to Available Tools navigation menu
- **docs/advanced-usage/available-tools/tool-use-overview.md**: Reorganized tool categories:
  - Split "Read Group" into separate "Read Group" and "Search Group" categories
  - **Read Group**: File system reading and exploration (read_file, list_files, list_code_definition_names)
  - **Search Group**: Pattern and semantic searching (search_files, codebase_search)
  - Updated tool group table to reflect the new logical separation
  - Updated common patterns example to showcase semantic search with codebase_search
  - Improved categorization aligns with actual tool usage patterns

## Technical Coverage
The documentation accurately reflects the tool's implementation including:
- CodeIndexManager integration and availability validation
- Dual output format for AI and UI consumption
- Vector similarity search with cosine similarity and 0.4 threshold
- Performance optimizations (50 result limit, Tree-sitter language support)
- Path filtering and workspace-relative result formatting
- Integration with experimental indexing infrastructure

This provides users with clear guidance on semantic code search capabilities while maintaining appropriate warnings about the experimental nature of the feature.
@hannesrudolph hannesrudolph changed the title Add complete documentation for the new experimental Codebase Indexing feature that enables semantic code search using AI embeddings. Add Documentation for Experimental Codebase Indexing Feature May 24, 2025
@hannesrudolph hannesrudolph merged commit 487827a into main May 24, 2025
3 checks passed
@hannesrudolph hannesrudolph deleted the indexing-exp branch May 24, 2025 19:16
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap May 24, 2025
@hannesrudolph
Copy link
Collaborator Author

@mrubens thanks so much for approving it after I merged it :P
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants

Comments