-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathnotes.txt
More file actions
27 lines (10 loc) · 1.12 KB
/
notes.txt
File metadata and controls
27 lines (10 loc) · 1.12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
- The download from Dropbox has more than it really should. It's okay to have a target to get NQ and QB from *publicly accessible sources*, but all intermmediate results should be created within the code here, not downloaded. requirements.txt should be in this repo.
- Can we get stubs for the tests so that this doesn't have as many dependencies?
- It seems odd to use spacy *and* NLTK for POS tagging. Wouldn't this be cleaner to converge on a single set?
- Why is "add_question_word_if_no_pronouns" not a heuristic transformation. It sure seems like one, but it lives in a different part of the code base
- It might be good to substitute non-answer pronouns so that if we extract spans they aren't ambiguous. E.g., "she founded Carthage and reigned as its queen from 814-759 BC" becomes "she founded Carthage and reigned as Carthage's queen from 814-759 BC"
Round 2:
- Answer type classifier can probably happen with the LAT frequency analysis
Round 3:
replace dict and identify can probably be replaced with single regexp that has better coverage
re.compile("[fF]or (ten|10) points [\S]* (name|identify|give) (this)?\s*")