Skip to content

Conversation

@jayhack
Copy link
Contributor

@jayhack jayhack commented Feb 9, 2025

Introduces very naive vector index as an "extension":

  • Get all embeddings from OpenAI
  • Store then in a numpy array
  • Ability to store/save/load on disk

Initial investigations show this takes about 50mb of memory for all of pytorch and takes 2.5 minutes.

Future iterations on this can show how to:

  • invalidate embeddings when a file blob hash changes
  • store on a symbol level
  • compute symbol-level embeddings including their extended context

etc.

This is designed to be used as input for other APIs, like the "semantic search" tool, LlamaIndex retrievers etc.

image

@jayhack jayhack requested review from a team and codegen-team as code owners February 9, 2025 21:51
@codecov
Copy link

codecov bot commented Feb 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files

@jayhack jayhack merged commit e2b2da2 into develop Feb 9, 2025
23 of 24 checks passed
@jayhack jayhack deleted the jay/vector-index branch February 9, 2025 23:28
@github-actions
Copy link
Contributor

🎉 This PR is included in version 0.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

tkfoss pushed a commit that referenced this pull request Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants