Codebase indexing #279
Replies: 2 comments
-
From what I've seen, both Cursor and Windsurf implemented this as an MCP server, and they use this minimal structure when doing semantic search, they use a I created a minimal implementation as MCP server, using Ollama with a tiny embedding model check https://github.com/ofou/codebase_search, the former implementation seemed too complicated as the first implementation of this, and I don't like that they use a cloud-based vector store. Semantic search results (raw JSON)[
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/rag_systems.txt",
"similarity": 0.6941935269156939,
"relative_path": "folder_to_test/rag_systems.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/rag_implementation_code.txt",
"similarity": 0.6449619035006291,
"relative_path": "folder_to_test/rag_implementation_code.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/vector_databases.txt",
"similarity": 0.6184170305394026,
"relative_path": "folder_to_test/vector_databases.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/embedding_models.txt",
"similarity": 0.47589052901500284,
"relative_path": "folder_to_test/embedding_models.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/ml_healthcare.txt",
"similarity": 0.4476328745171523,
"relative_path": "folder_to_test/ml_healthcare.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/document_chunking.txt",
"similarity": 0.44762066545544027,
"relative_path": "folder_to_test/document_chunking.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/climate_change.txt",
"similarity": 0.4022195935701077,
"relative_path": "folder_to_test/climate_change.txt"
},
{
"file": "/Users/ofou/os/codebase_search/folder_to_test/rag_evaluation.txt",
"similarity": 0.38974273918673374,
"relative_path": "folder_to_test/rag_evaluation.txt"
}
]
|
Beta Was this translation helpful? Give feedback.
-
RooCodeInc/Roo-Code#411 was merged and added indexing in Kilo Code |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Codebase indexing is the idea of storing some sort of compressed representation of the codebase which can be used by the agent. For example, Cursor and Windsurf generate an embedding for each file in the codebase.
There are open source implementations which we could use:
Related tickets:
Keywords: embedding, vector database, RAG (retrieval-augmented generation)
Beta Was this translation helpful? Give feedback.
All reactions