Skip to content

xatkit-bot-platform/labs-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

labs-qa

This repositori contains an experimental Java module to run a simple Q&A pipeline using huggingface models.

The repo's Wiki contains different sources of information about this topic.

Requirements

$ pip install flask
$ pip install transformers
$ pip install torch

Usage

  1. Clone this repo
git clone https://github.com/xatkit-bot-platform/labs-qa
  1. Run the local server
flask run
  1. Run the main class src/main/java/QuestionAnswering.java. You can edit the main method to specify your favourite huggingface endpoints (language models names).

TO DO:

  • beginPosition and endPosition of the answers are expressed in terms of tokens (e.g. beginPosition = 10 means that the answers starts at the 10th token of the given corpus). Try to convert it to char positions (each language model may have a different tokenizer and therefore the tokens for the same corpus may be different for 2 language models)
  • Get the score of the answers (probabilities). Actually the score is composed by 2 scores: the score of the beginning of the answer and the score of the end of the answer.
  • To solve: sometimes the best scored beginning is after the best scored end (e.g. best possible beginning of the answer at position 10 and best possible end of the answer at position 5), so the result is an empty answer. Try to get the second (or third, forth...) best option that contains an answer
  • Deal with large corpus. Some transformers have a limit of 512 tokens, others 4096... When the given corpus exceeds this limit, we need to split it. There are different ways to do this, so we need to analyze the possibilities.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •