Skip to content

Commit a10a0bb

Browse files
committed
Add complete documentation for the new experimental Codebase Indexing feature that enables semantic code search using AI embeddings.
## New Documentation - **docs/features/experimental/codebase-indexing.mdx**: Comprehensive feature documentation covering: - Semantic search capabilities using Tree-sitter parsing and AI embeddings - Setup requirements for OpenAI/Ollama embedding providers and Qdrant vector database - Configuration steps and status indicators - File processing with smart code parsing and automatic filtering - Best practices for model selection and security considerations - Current limitations and future enhancements - Privacy and security considerations ## Updated Navigation & Cross-references - **sidebars.ts**: Added Codebase Indexing to Features > Experimental navigation menu - **docs/features/experimental/experimental-features.md**: - Added Codebase Indexing to experimental features list - Added screenshot showing the experimental features settings panel - **docs/faq.md**: Added FAQ entries explaining: - What Codebase Indexing is and its semantic search capabilities - Cost considerations for embedding generation and vector storage ## Assets - **static/img/experimental-features/experimental-features.png**: Screenshot of experimental features settings panel ## Technical Details Covered - Tree-sitter integration for AST-based code parsing - Support for both OpenAI and Ollama embedding providers - Qdrant vector database integration with local and cloud deployment options - Incremental indexing with file watching and hash-based caching - Smart file filtering excluding binaries, large files, and common ignore patterns - codebase_search tool integration for AI-powered code discovery The documentation is targeted at a semi-technical audience and provides practical setup guidance while explaining the underlying semantic search technology.
1 parent 30f1862 commit a10a0bb

File tree

5 files changed

+200
-2
lines changed

5 files changed

+200
-2
lines changed

docs/faq.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ Yes, if you use a [local model](/advanced-usage/local-models).
136136

137137
Yes, you can create your own MCP servers to add custom functionality to Roo Code. See the [MCP documentation](https://github.com/modelcontextprotocol) for details.
138138

139+
### What is Codebase Indexing?
140+
141+
[Codebase Indexing](/features/experimental/codebase-indexing) is an experimental feature that creates a semantic search index of your project using AI embeddings. This enables Roo Code to better understand and navigate large codebases by finding relevant code based on meaning rather than just keywords.
142+
143+
### How much does Codebase Indexing cost?
144+
145+
Codebase Indexing requires an OpenAI API key for generating embeddings and a Qdrant vector database for storage. Costs depend on your project size and the embedding model used. Initial indexing is the most expensive part; subsequent updates are incremental and much cheaper.
146+
139147
## Troubleshooting
140148

141149
### Roo Code isn't responding. What should I do?
@@ -148,7 +156,7 @@ Yes, you can create your own MCP servers to add custom functionality to Roo Code
148156

149157
### I'm seeing an error message. What does it mean?
150158

151-
The error message should provide some information about the problem. If you're unsure how to resolve it, seek help in the community forums.
159+
The error message should provide some information about the problem. If you're unsure how to resolve it, seek help in [Discord](https://discord.gg/roocode).
152160

153161
### Roo Code made changes I didn't want. How do I undo them?
154162

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
import Codicon from '@site/src/components/Codicon';
2+
3+
# Codebase Indexing
4+
5+
**⚠️ Experimental Feature:** This feature is under active development and may change significantly in future releases.
6+
7+
Codebase Indexing enables semantic code search across your entire project using AI embeddings. Instead of searching for exact text matches, it understands the *meaning* of your queries, helping Roo Code find relevant code even when you don't know specific function names or file locations.
8+
9+
<img src="/img/experimental-features/experimental-features.png" alt="Codebase Indexing Settings" width="800" />
10+
11+
## What It Does
12+
13+
When enabled, the indexing system:
14+
15+
1. **Parses your code** using Tree-sitter to identify semantic blocks (functions, classes, methods)
16+
2. **Creates embeddings** of each code block using AI models
17+
3. **Stores vectors** in a Qdrant database for fast similarity search
18+
4. **Provides the `codebase_search` tool** to Roo for intelligent code discovery
19+
20+
This enables natural language queries like "user authentication logic" or "database connection handling" to find relevant code across your entire project.
21+
22+
## Key Benefits
23+
24+
- **Semantic Search**: Find code by meaning, not just keywords
25+
- **Enhanced AI Understanding**: Roo can better comprehend and work with your codebase
26+
- **Cross-Project Discovery**: Search across all files, not just what's open
27+
- **Pattern Recognition**: Locate similar implementations and code patterns
28+
29+
## Setup Requirements
30+
31+
### Embedding Provider
32+
33+
Choose one of these options for generating embeddings:
34+
35+
**OpenAI (Recommended)**
36+
- Requires OpenAI API key
37+
- Supports all OpenAI embedding models
38+
- Default: `text-embedding-3-small`
39+
- Processes up to 100,000 tokens per batch
40+
41+
**Ollama (Local)**
42+
- Requires local Ollama installation
43+
- No API costs or internet dependency
44+
- Supports any Ollama-compatible embedding model
45+
- Requires Ollama base URL configuration
46+
47+
### Vector Database
48+
49+
**Qdrant** is required for storing and searching embeddings:
50+
- **Local**: `http://localhost:6333` (recommended for testing)
51+
- **Cloud**: Qdrant Cloud or self-hosted instance
52+
- **Authentication**: Optional API key for secured deployments
53+
54+
## Setting Up Qdrant
55+
56+
### Quick Local Setup
57+
58+
**Using Docker:**
59+
```bash
60+
docker run -p 6333:6333 qdrant/qdrant
61+
```
62+
63+
**Using Docker Compose:**
64+
```yaml
65+
version: '3.8'
66+
services:
67+
qdrant:
68+
image: qdrant/qdrant
69+
ports:
70+
- "6333:6333"
71+
volumes:
72+
- qdrant_storage:/qdrant/storage
73+
volumes:
74+
qdrant_storage:
75+
```
76+
77+
### Production Deployment
78+
79+
For team or production use:
80+
- [Qdrant Cloud](https://cloud.qdrant.io/) - Managed service
81+
- Self-hosted on AWS, GCP, or Azure
82+
- Local server with network access for team sharing
83+
84+
## Configuration
85+
86+
1. Open Roo Code settings (<Codicon name="gear" /> icon)
87+
2. Navigate to **Experimental** section
88+
3. Enable **"Enable Codebase Indexing"**
89+
4. Configure your embedding provider:
90+
- **OpenAI**: Enter API key and select model
91+
- **Ollama**: Enter base URL and select model
92+
5. Set Qdrant URL and optional API key
93+
6. Click **Save** to start initial indexing
94+
95+
## Understanding Index Status
96+
97+
The interface shows real-time status with color indicators:
98+
99+
- **Standby** (Gray): Not running, awaiting configuration
100+
- **Indexing** (Yellow): Currently processing files
101+
- **Indexed** (Green): Up-to-date and ready for searches
102+
- **Error** (Red): Failed state requiring attention
103+
104+
## How Files Are Processed
105+
106+
### Smart Code Parsing
107+
- **Tree-sitter Integration**: Uses AST parsing to identify semantic code blocks
108+
- **Language Support**: All languages supported by Tree-sitter
109+
- **Fallback**: Line-based chunking for unsupported file types
110+
- **Block Sizing**:
111+
- Minimum: 100 characters
112+
- Maximum: 1,000 characters
113+
- Splits large functions intelligently
114+
115+
### Automatic File Filtering
116+
The indexer automatically excludes:
117+
- Binary files and images
118+
- Large files (&gt;1MB)
119+
- Git repositories (`.git` folders)
120+
- Dependencies (`node_modules`, `vendor`, etc.)
121+
- Files matching `.gitignore` and `.rooignore` patterns
122+
123+
### Incremental Updates
124+
- **File Watching**: Monitors workspace for changes
125+
- **Smart Updates**: Only reprocesses modified files
126+
- **Hash-based Caching**: Avoids reprocessing unchanged content
127+
- **Branch Switching**: Automatically handles Git branch changes
128+
129+
## Best Practices
130+
131+
### Model Selection
132+
133+
**For OpenAI:**
134+
- **`text-embedding-3-small`**: Best balance of performance and cost
135+
- **`text-embedding-3-large`**: Higher accuracy, 5x more expensive
136+
- **`text-embedding-ada-002`**: Legacy model, lower cost
137+
138+
**For Ollama:**
139+
- Choose models based on your hardware capabilities
140+
- Larger models provide better accuracy but require more resources
141+
142+
### Security Considerations
143+
- **API Keys**: Stored securely in VS Code's encrypted storage
144+
- **Code Privacy**: Only small code snippets sent for embedding (not full files)
145+
- **Local Processing**: All parsing happens locally
146+
- **Qdrant Security**: Use authentication for production deployments
147+
148+
## Current Limitations
149+
150+
- **File Size**: 1MB maximum per file
151+
- **Markdown**: Not currently supported due to parsing complexity
152+
- **Single Workspace**: One workspace at a time
153+
- **Dependencies**: Requires external services (embedding provider + Qdrant)
154+
- **Language Coverage**: Limited to Tree-sitter supported languages
155+
156+
## Using the Search Feature
157+
158+
Once indexed, Roo can use the `codebase_search` tool to find relevant code:
159+
160+
**Example Queries:**
161+
- "How is user authentication handled?"
162+
- "Database connection setup"
163+
- "Error handling patterns"
164+
- "API endpoint definitions"
165+
166+
The tool provides Roo with:
167+
- Relevant code snippets
168+
- File paths and line numbers
169+
- Similarity scores
170+
- Contextual information
171+
172+
## Privacy & Security
173+
174+
- **Code stays local**: Only small code snippets sent for embedding
175+
- **Embeddings are numeric**: Not human-readable representations
176+
- **Secure storage**: API keys encrypted in VS Code storage
177+
- **Local option**: Use Ollama for completely local processing
178+
- **Access control**: Respects existing file permissions
179+
180+
## Future Enhancements
181+
182+
Planned improvements:
183+
- Additional embedding providers
184+
- Improved markdown and documentation support
185+
- Multi-workspace indexing
186+
- Enhanced filtering and configuration options
187+
- Team sharing capabilities
188+
- Integration with VS Code's native search

docs/features/experimental/experimental-features.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Roo Code includes experimental features that are still under development. These
55
**Warning:** Experimental features may have unexpected behavior, including potential data loss or security vulnerabilities. Enable them at your own risk.
66

77
## Enabling Experimental Features
8-
8+
![alt text](../../../static/img/experimental-features/experimental-features.png)
99
To enable or disable experimental features:
1010

1111
1. Open the Roo Code settings (<Codicon name="gear" /> icon in the top right corner).
@@ -16,6 +16,7 @@ To enable or disable experimental features:
1616

1717
The following experimental features are currently available:
1818

19+
- [Codebase Indexing](/features/experimental/codebase-indexing) - Semantic search through AI-powered codebase indexing
1920
- [Intelligently Condense the Context Window](/features/experimental/intelligent-context-condensing)
2021
- [Power Steering](/features/experimental/power-steering)
2122

sidebars.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ const sidebars: SidebarsConfig = {
6060
label: 'Experimental',
6161
items: [
6262
'features/experimental/experimental-features',
63+
'features/experimental/codebase-indexing',
6364
'features/experimental/intelligent-context-condensing',
6465
'features/experimental/power-steering',
6566
],
154 KB
Loading

0 commit comments

Comments
 (0)