qb2nq/notes.txt at main · Pinafore/qb2nq · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


- The download from Dropbox has more than it really should.  It's okay to have a target to get NQ and QB from *publicly accessible sources*, but all intermmediate results should be created within the code here, not downloaded.  requirements.txt should be in this repo.


- Can we get stubs for the tests so that this doesn't have as many dependencies?


- It seems odd to use spacy *and* NLTK for POS tagging.  Wouldn't this be cleaner to converge on a single set?

- Why is "add_question_word_if_no_pronouns" not a heuristic transformation.  It sure seems like one, but it lives in a different part of the code base


- It might be good to substitute non-answer pronouns so that if we extract spans they aren't ambiguous.  E.g., "she founded Carthage and reigned as its queen from 814-759 BC" becomes "she founded Carthage and reigned as Carthage's queen from 814-759 BC"


Round 2:

- Answer type classifier can probably happen with the LAT frequency analysis


Round 3:

replace dict and identify can probably be replaced with single regexp that has better coverage

re.compile("[fF]or (ten|10) points [\S]* (name|identify|give) (this)?\s*")