Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Readme.md

Question and Answering

Using transformer architecture, we store a corpus (array of strings) as a vector embedding then send a human-native question also via a vector embedding to our similarity calculation.

See the short video below:

1. Import Dependencies

We'll be using a popular pre-trained sentence transformer model. You can alternatively train your own or re-train an existing one.

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

2. Prepare Corpus

Convert the corpus into an array of sentences. We do this to ensure our engine returns the most relevant independent thought.

corpus = "Japan is an island country in East Asia. It spans an archipelago of 6852 islands."
docs = corpus.split('.')
corpus_vector = model.encode(docs)

3. Prepare Question

Just like above, we convert the question into a vector.

query = "How many islands are comprised of Japan?"
query_vector = model.encode(query)

4. Calculate Similarity

The heart of vector search is in the similarity calculation. Here we use cosine similarity but you can experiment with others.

scores = util.cos_sim(query_vector, corpus_vector)[0].cpu().tolist()
"It spans an archipelago of 6852 islands."

Next Steps

  1. Store your corpus vector embeddings in one of the many Vector Search Engines out there.
  2. Experiment with other similarity calculations (euclidian and dot-product to name a few)
  3. Store, version and scale your embedding model

Other Q&A Use Cases

  • Chat Bot
  • Support Agent
  • FAQ Guidance
  • Website Search Engine