Rename search_code to codebase_search with balanced instructions

IvanBiruk · IvanBiruk · commit f9bbb9bddb6a · 2025-09-21T12:49:13.000+02:00
- Renamed search_code function to codebase_search across all files
- Updated docstring to emphasize semantic search as the MAIN exploration tool
- Added clear guidance on when to use codebase_search vs grep:
  * Always prefer codebase_search for initial code exploration
  * Use grep only for uncommitted local changes or different branches
- Clarified that semantic search operates on indexed repository state (main/master branch)
- Updated all references in tests, documentation, and imports
- Maintains backward compatibility with existing data source format
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -48,7 +48,7 @@ This is a Model Context Protocol (MCP) server that provides AI clients with acce
 ### Core Components
 
 - **`codealive_mcp_server.py`**: Main server implementation using FastMCP framework
-- **Three main tools**: `ask_question`, `search_code`, `get_data_sources`
+- **Three main tools**: `chat_completions`, `codebase_search`, `get_data_sources`
 - **CodeAliveContext**: Manages HTTP client and API credentials
 - **Async lifespan management**: Handles client setup/teardown
 
@@ -63,7 +63,7 @@ This is a Model Context Protocol (MCP) server that provides AI clients with acce
 ### Data Flow
 
 1. AI client connects to MCP server via stdio/SSE transport
-2. Client calls tools (`get_data_sources` → `search_code` → `ask_question`)
+2. Client calls tools (`get_data_sources` → `codebase_search` → `chat_completions`)
 3. MCP server translates tool calls to CodeAlive API requests
 4. CodeAlive API returns semantic search results or chat completions
 5. Server formats and returns results to AI client
diff --git a/README.md b/README.md
@@ -22,16 +22,16 @@ This MCP (Model Context Protocol) server enables AI clients like Claude Code, Cu
 Once connected, you'll have access to these powerful tools:
 
 1. **`get_data_sources`** - List your indexed repositories and workspaces
-2. **`search_code`** - Semantic code search across your codebase  
-3. **`ask_question`** - AI chat with full project context
+2. **`codebase_search`** - Semantic code search across your indexed codebase (main/master branch)  
+3. **`chat_completions`** - AI chat with full project context
 
 ## 🎯 Usage Examples
 
 After setup, try these commands with your AI assistant:
 
 - *"Show me all available repositories"* → Uses `get_data_sources`
-- *"Find authentication code in the user service"* → Uses `search_code`
-- *"Explain how the payment flow works in this codebase"* → Uses `ask_question`
+- *"Find authentication code in the user service"* → Uses `codebase_search`
+- *"Explain how the payment flow works in this codebase"* → Uses `chat_completions`
 
 ## Table of Contents
 
diff --git a/src/codealive_mcp_server.py b/src/codealive_mcp_server.py
@@ -25,7 +25,7 @@
 
 # Import core components
 from core import codealive_lifespan, setup_debug_logging
-from tools import chat_completions, get_data_sources, search_code
+from tools import chat_completions, get_data_sources, codebase_search
 
 # Initialize FastMCP server with lifespan and enhanced system instructions
 mcp = FastMCP(
@@ -42,7 +42,7 @@
 
     When working with a codebase:
     1. First use `get_data_sources` to identify available repositories and workspaces
-    2. Then use `search_code` to find relevant files and code snippets
+    2. Then use `codebase_search` to find relevant files and code snippets
     3. Finally, use `chat_completions` for in-depth analysis of the code
 
     For effective code exploration:
@@ -90,7 +90,7 @@ async def health_check(request: Request) -> JSONResponse:
 # Register tools
 mcp.tool()(chat_completions)
 mcp.tool()(get_data_sources)
-mcp.tool()(search_code)
+mcp.tool()(codebase_search)
 
 
 def main():
diff --git a/src/tests/test_response_transformer.py b/src/tests/test_response_transformer.py
@@ -267,7 +267,7 @@ def test_data_preservation_without_content(self):
             "results": [
                 {
                     "kind": "Symbol",
-                    "identifier": "CodeAlive-AI/codealive-mcp::src/tools/search.py::search_code",
+                    "identifier": "CodeAlive-AI/codealive-mcp::src/tools/search.py::codebase_search",
                     "location": {
                         "path": "src/tools/search.py",
                         "range": {"start": {"line": 18}, "end": {"line": 168}}
@@ -317,13 +317,13 @@ def test_data_preservation_with_content(self):
             "results": [
                 {
                     "kind": "Symbol",
-                    "identifier": "CodeAlive-AI/codealive-mcp::src/tools/search.py::search_code",
+                    "identifier": "CodeAlive-AI/codealive-mcp::src/tools/search.py::codebase_search",
                     "location": {
                         "path": "src/tools/search.py",
                         "range": {"start": {"line": 18}, "end": {"line": 168}}
                     },
                     "score": 0.99,
-                    "content": "async def search_code(\n    ctx: Context,\n    query: str,\n    data_source_ids: Optional[List[str]] = None,\n    mode: str = \"auto\",\n    include_content: bool = False\n) -> Dict:",
+                    "content": "async def codebase_search(\n    ctx: Context,\n    query: str,\n    data_source_ids: Optional[List[str]] = None,\n    mode: str = \"auto\",\n    include_content: bool = False\n) -> Dict:",
                     "dataSource": {
                         "type": "repository",
                         "id": "685b21230e3822f4efa9d073",
@@ -369,7 +369,7 @@ def test_data_preservation_with_content(self):
         assert 'endLine="168"' in result
 
         # Verify content is included
-        assert "async def search_code" in result
+        assert "async def codebase_search" in result
         assert "include_content: Whether to include full file content" in result
         assert "This file provides guidance" in result
 
diff --git a/src/tests/test_search_tool.py b/src/tests/test_search_tool.py
@@ -3,13 +3,13 @@
 import pytest
 from unittest.mock import AsyncMock, MagicMock, patch
 from fastmcp import Context
-from tools.search import search_code
+from tools.search import codebase_search
 
 
 @pytest.mark.asyncio
 @patch('tools.search.get_api_key_from_context')
-async def test_search_code_returns_dict(mock_get_api_key):
-    """Test that search_code returns a dictionary with structured_content."""
+async def test_codebase_search_returns_dict(mock_get_api_key):
+    """Test that codebase_search returns a dictionary with structured_content."""
     # Mock the API key function
     mock_get_api_key.return_value = "test_key"
 
@@ -47,8 +47,8 @@ async def test_search_code_returns_dict(mock_get_api_key):
     ctx.request_context.lifespan_context = mock_codealive_context
     ctx.request_context.headers = {"authorization": "Bearer test_key"}
 
-    # Call search_code
-    result = await search_code(
+    # Call codebase_search
+    result = await codebase_search(
         ctx=ctx,
         query="authenticate_user",
         data_source_ids=["test_id"],
@@ -57,7 +57,7 @@ async def test_search_code_returns_dict(mock_get_api_key):
     )
 
     # Verify result is a dictionary
-    assert isinstance(result, dict), "search_code should return a dictionary"
+    assert isinstance(result, dict), "codebase_search should return a dictionary"
 
     # Verify it has structured_content field
     assert "structured_content" in result, "Result should have structured_content field"
diff --git a/src/tools/__init__.py b/src/tools/__init__.py
@@ -2,6 +2,6 @@
 
 from .chat import chat_completions
 from .datasources import get_data_sources
-from .search import search_code
+from .search import codebase_search
 
-__all__ = ['chat_completions', 'get_data_sources', 'search_code']
+__all__ = ['chat_completions', 'get_data_sources', 'codebase_search']
diff --git a/src/tools/datasources.py b/src/tools/datasources.py
@@ -48,7 +48,7 @@ async def get_data_sources(ctx: Context, alive_only: bool = True) -> str:
         For workspaces, the repositoryIds can be used to identify and work with
         individual repositories that make up the workspace.
 
-        Use the returned data source IDs with the search_code and chat_completions functions.
+        Use the returned data source IDs with the codebase_search and chat_completions functions.
     """
     context: CodeAliveContext = ctx.request_context.lifespan_context
 
@@ -84,7 +84,7 @@ async def get_data_sources(ctx: Context, alive_only: bool = True) -> str:
         result = f"Available data sources:\n{formatted_data}"
 
         # Add usage hint
-        result += "\n\nYou can use these data source IDs with the search_code and chat_completions functions."
+        result += "\n\nYou can use these data source IDs with the codebase_search and chat_completions functions."
 
         return result
 
diff --git a/src/tools/search.py b/src/tools/search.py
@@ -10,25 +10,38 @@
 from utils import transform_search_response_to_xml, handle_api_error
 
 
-async def search_code(
+async def codebase_search(
     ctx: Context,
     query: str,
     data_source_ids: Optional[List[str]] = None,
     mode: str = "auto",
     include_content: bool = False
 ) -> Dict:
     """
-    SEMANTIC search across your codebases.
+    Use `codebase_search` tool to search for code in the codebase.
 
-    This endpoint is optimized for **natural-language** questions and intent-driven queries
-    (not rigid templates). Ask it things like:
+    Semantic search (`codebase_search`) is your MAIN exploration tool for understanding the
+    indexed codebase (typically main/master branch or the specific branch shown in data sources).
+
+    ALWAYS prefer using `codebase_search` over grep/find for initial code exploration because:
+    - It's much faster and more efficient for discovering relevant code
+    - It understands semantic meaning, not just text patterns
+    - It searches the indexed repository state with full context
+
+    IMPORTANT: This searches the INDEXED version of repositories (check branch in get_data_sources),
+    NOT the current local files. Use grep when you specifically need to:
+    - Search uncommitted local changes
+    - Verify recent modifications
+    - Check files on a different branch than the indexed one
+
+    This tool excels at natural-language questions and intent-driven queries like:
       • "What is the authentication flow?"
       • "Where is the user registration logic implemented?"
       • "How do services communicate with the billing API?"
       • "Where is rate limiting handled?"
       • "Show me how we validate JWTs."
 
-    You can still include function/class names if you know them, but it's not required.
+    You can include function/class names for more targeted results.
 
     Args:
         query: A natural-language description of what you're looking for.
@@ -57,19 +70,19 @@ async def search_code(
 
     Examples:
         1. Natural-language question (recommended):
-           search_code(query="What is the auth flow?", data_source_ids=["repo123"])
+           codebase_search(query="What is the auth flow?", data_source_ids=["repo123"])
 
         2. Intent query:
-           search_code(query="Where is user registration logic?", data_source_ids=["repo123"])
+           codebase_search(query="Where is user registration logic?", data_source_ids=["repo123"])
 
         3. Workspace-wide question:
-           search_code(query="How do microservices talk to the billing API?", data_source_ids=["workspace456"])
+           codebase_search(query="How do microservices talk to the billing API?", data_source_ids=["workspace456"])
 
         4. Mixed query with a known identifier:
-           search_code(query="Where do we validate JWTs (AuthService)?", data_source_ids=["repo123"])
+           codebase_search(query="Where do we validate JWTs (AuthService)?", data_source_ids=["repo123"])
 
         5. Concise results without full file contents:
-           search_code(query="Where is password reset handled?", data_source_ids=["repo123"], include_content=false)
+           codebase_search(query="Where is password reset handled?", data_source_ids=["repo123"], include_content=false)
 
     Note:
         - At least one data_source_id must be provided