-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
I'm using spacy-affixes as part of the SpaCy pipeline, as explained in the usage guide. It has been working properly until I tried the following sentence: "Sube el paro". When doing nlp("Sube el paro.") I'm getting the following error:
Traceback (most recent call last):
File "/home/usuario/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-21-751769ff6949>", line 1, in <module>
nlp("Sube el paro.")
File "/home/usuario/.local/lib/python3.6/site-packages/spacy/language.py", line 435, in __call__
doc = proc(doc, **component_cfg.get(name, {}))
File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 163, in __call__
self.apply_rules(retokenizer, token, rule)
File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 140, in apply_rules
token, [*rule["affix_text"], token_sub], heads
File "_retokenize.pyx", line 88, in spacy.tokens._retokenize.Retokenizer.split
ValueError: [E117] The newly split tokens must match the text of the original token. New orths: subSube. Old text: Sube.
From my experience and tries, I can say the bug happens with texts like:
nlp("Sube el paro.")
nlp("Sube")
nlp("Subir")
nlp("Subiendo")
But not with texts like:
nlp("sube el paro.")
nlp("sube")
nlp("Subasta")
nlp("Subimos")
Given the error thrown, something related to matching prefix "sub" might be messing things up.
My configuration
- Ubuntu 18.04.3 LTS
- Python 3.6.9
- spacy-affixes 0.1.4
- spacy 2.2.3
Metadata
Metadata
Assignees
Labels
No labels