-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Code indexer #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code indexer #1050
Conversation
|
| "lint-fix": "eslint src --ext ts --fix && npm run lint-fix --prefix webview-ui", | ||
| "lint-fix-local": "eslint -c .eslintrc.local.json src --ext ts --fix && npm run lint-fix --prefix webview-ui", | ||
| "package": "npm run build:webview && npm run check-types && npm run lint && node esbuild.js --production", | ||
| "pretest": "npm run compile && npm run compile:integration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an unnecessary step for just running jest, and we already run it as part of test:integration.
| savedKey = process.env.OPENAI_API_KEY | ||
| process.env.OPENAI_API_KEY = "fake" | ||
|
|
||
| nock.back.fixtures = path.join(__dirname, "..", "__fixtures__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to mock anything, so I recorded the OpenAI embeddings API requests using nock.
| public async initialize() { | ||
| this.connection = await connect(this.dbPath) | ||
|
|
||
| const fnCreator = getRegistry().get("openai") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To start we'll only support OpenAI / text-embedding-ada-002, which means everyone will need an API profile with an OpenAI API key. Over time we can add more embedding options, including local options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planning on giving this a spin locally and will take a look at the PR too -- however, specifically on the embedding model, should we better go with "text-embedding-3-small" to begin with? seems more performant and also cheaper -> https://platform.openai.com/docs/guides/embeddings/embedding-models#embedding-models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use free embeding service from nvidia, or we don't have to take the RAG route, just like repoprompt.com did
| @@ -1,15 +1,11 @@ | |||
| // npx jest src/services/tree-sitter/__tests__/index.test.ts | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was mocking too much, and isn't compatible with the latest version of WASM tree-sitter. I updated it appropriately.
| @@ -1,118 +1,106 @@ | |||
| // npx jest src/services/tree-sitter/__tests__/languageParser.test.ts | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was mocking too much, and isn't compatible with the latest version of WASM tree-sitter. I updated it appropriately.
|
|
||
| async function loadLanguage(langName: string) { | ||
| return await Parser.Language.load(path.join(__dirname, `tree-sitter-${langName}.wasm`)) | ||
| if (process.env.NODE_ENV === "test") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspired by continue.dev; allow tests to loading language syntax trees so we don't have to mock.
|
How do we mitigate code stalenes? |
My plan is to do something similar to this: https://github.com/continuedev/continue/blob/main/core/indexing/README.md |
Is it possible to use this as another tools to make code insertion more precise, I tried diff insert for single file with > 3000 lines, and roo-code deletes all the line instead of insert between lines |
|
this is nice feature, please make it happen @mrubens |
|
@cte why not using trigram based regex indexing, then combined with ast parsing ?? |
|
This is a good feature. Why it is deferred? @mrubens |
We're a bit unclear on how impactful the feature is, so we prioritized evals in order to have a better method for measuring the impact when we get to it. There's also another version of this actively being built by the community which might make this PR irrelevant. Stay tuned! |

Description
Using continue.dev as a reference, implement a basic "code indexer", which consists of three components:
The idea is that we'll:
This PR just shows a small piece of this system; namely how to index new code so that it's semantically searchable.
Known issues:
.vsixsize significantly (Continue.dev's is about 80mb, and ours will be similar)TheFixed!@lancedb/lancedbnpm package doesn't play nicely withCommonJS, so we'll need to update our integration test setup to use a more ESM-friendly configuration, which is going to be a bit annoying.Type of change
How Has This Been Tested?
Checklist:
Additional context
Related Issues
Reviewers
Important
Implements a code indexer with chunking, embedding, and searching capabilities, adds tests, and updates integration setup for LanceDB.
CodeSearchincode-search.tsfor indexing and searching code chunks using LanceDB.getChunks()inchunker.tsto parse and chunk code files.supportedLanguagesinchunker.ts.chunker.test.ts,code-search.test.ts, anduri.test.tsfor chunking, indexing, and URI handling.index.test.tsandlanguageParser.test.tsto remove mock parsers and use real file reading.package.jsonto fix integration test setup for ESM compatibility..vscodeignoreto manage.vsixsize.readFile()infs.tsfor easier mocking in tests.tsconfig.integration.jsonto setrootDirtointegration-tests.This description was created by
for 4587c75. It will automatically update as commits are pushed.