Commit 98b00ea
authored
Fix handling of small docs in coref (#28)
* Fix handling of small docs in coref
Docs with one or zero tokens fail in the coref component. This doesn't
have a fix yet, just a failing test. (There is also a test for the span
resolver, which does not fail.)
* Add example short doc to tests
It might be better to include this optionally? On the other hand, since
it should just be ignored in training, having it always there is more
thorough.
* Skip short docs
There can be no coref prediction for docs with one token (or no tokens).
Attempting to treat docs like that normally causes a mess with size
inference, so instead they're skipped.
In training, this just involves skipping the docs in the update step.
This is simple due to the fake batching structure, since the batch
doesn't have to be maintained.
In inference, this just involves short-circuiting to an empty
prediction.
* Clean up retokenization test
The retokenization test is hard-coded to the the training example
because it manually merges some tokens, to make sure that the prediction
and merge line up. It would probably be better to separate out the
training data from the general example here, but for now narrowing the
training data works.1 parent 5cd4731 commit 98b00ea
File tree
3 files changed
+25
-3
lines changed- spacy_experimental/coref
- tests
3 files changed
+25
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
148 | 153 | | |
149 | 154 | | |
150 | 155 | | |
| |||
232 | 237 | | |
233 | 238 | | |
234 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
235 | 243 | | |
236 | 244 | | |
237 | 245 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
40 | 45 | | |
41 | 46 | | |
42 | 47 | | |
| |||
83 | 88 | | |
84 | 89 | | |
85 | 90 | | |
| 91 | + | |
86 | 92 | | |
87 | 93 | | |
88 | 94 | | |
89 | | - | |
90 | | - | |
| 95 | + | |
| 96 | + | |
91 | 97 | | |
92 | 98 | | |
93 | 99 | | |
| |||
148 | 154 | | |
149 | 155 | | |
150 | 156 | | |
151 | | - | |
| 157 | + | |
| 158 | + | |
152 | 159 | | |
153 | 160 | | |
154 | 161 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
82 | 89 | | |
83 | 90 | | |
84 | 91 | | |
| |||
0 commit comments