-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
High Level OA Retrieval System
- Goal of this system
- Options available
- Design or Workflow for First Version
- Other Design thoughts
- Open Questions
- Timeline for First Version
Goal
- automatically retrieve information that will be beneficial for the answer from the LLM
- As a first step augmentation of OA with a Wikipedia index ?
- https://twitter.com/youraimarketer/status/1652388457036017667 support such document indexing ?
- Multilingual ?
Options available
- Allow the LM to decide like in the case of plugins
- Use index Design (mostly everyone is inclined to this approach)
use a professional vector-db in which we index documents based on embeddings, like for example all of wikipedia-
Segment the data into chunks (sentences/para)
-
Generate embeddings for each
-
Store the embeddings for retrieval ( FAISS,etc)
-
When presented with query retrieve related chunks from DB using some
metrics, for example cosine similarity -
Prompt LLM using query + retrieved chunks to generate the answer
-
LangChain being considered ?
-
LLamaIndex ?
-
VectorDB(s) under consideration
- Qdrant
- Weaviate https://weaviate.io/blog/sphere-dataset-in-weaviate
-
Benchmarks : http://ann-benchmarks.com/ ?
-
Draw backs :
- VectorDB(s) fail on semantically distant info.
- multi-step reasoning may be required or should have that semantic info as a vector in the DB. Need to explore this in detail.
-
Design or Workflow
Overall there are some similarities between retrieval and OA plugins (i.e. in the simplest case retrieval could be a plugin). The retrieval system will be a bit more closely integrated to the inference system for the easily updatable knowledge of the assistant
Need to come to a consensus on the workflow
- the point where the user-input becomes available
- UI changes required
- how do we decide that retrieval system should be activated
- use an invisible plugin that would connect to the VectorDB ?
- How do we decide when to query
- how is the query generated
- we need to figure out if LLaMA 30B is already well calibrated (i.e. can answer questions about its own knowledge ?
- how are the DB query results then processed and fed into the LLM for output generation?
- how do we decide which embeddings to be used for the queries ?
- how are the results processed
- how is the assistant updated ?
- Will this be a multi step reasoning to retrieve semantically distant chunks ?
- response presented to the user
Other design thoughts
There are 2 schools of thought for this system
- retrieval based models are mostly in knowledge seeking mode. for example QA. In case of creative mode it doesn't make sense to use a retrieval based model
Vs - Most artists & writers, are very deep into reference materials, inspirations, etc
the use-case of retrieval based models are mostly in knowledge seeking mode
Open questions
- how are the DB query results then processed and fed into the LLM for output generation?
- What are the possible changes required on the website
Timeline for First Version
TBD