Skip to content

Bug using words like 'Sube' at beginning #18

@JavierBJ

Description

@JavierBJ

I'm using spacy-affixes as part of the SpaCy pipeline, as explained in the usage guide. It has been working properly until I tried the following sentence: "Sube el paro". When doing nlp("Sube el paro.") I'm getting the following error:

Traceback (most recent call last):
  File "/home/usuario/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-21-751769ff6949>", line 1, in <module>
    nlp("Sube el paro.")
  File "/home/usuario/.local/lib/python3.6/site-packages/spacy/language.py", line 435, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 163, in __call__
    self.apply_rules(retokenizer, token, rule)
  File "/home/usuario/.local/lib/python3.6/site-packages/spacy_affixes/main.py", line 140, in apply_rules
    token, [*rule["affix_text"], token_sub], heads
  File "_retokenize.pyx", line 88, in spacy.tokens._retokenize.Retokenizer.split
ValueError: [E117] The newly split tokens must match the text of the original token. New orths: subSube. Old text: Sube.

From my experience and tries, I can say the bug happens with texts like:

nlp("Sube el paro.")
nlp("Sube")
nlp("Subir")
nlp("Subiendo")

But not with texts like:

nlp("sube el paro.")
nlp("sube")
nlp("Subasta")
nlp("Subimos")

Given the error thrown, something related to matching prefix "sub" might be messing things up.

My configuration

  • Ubuntu 18.04.3 LTS
  • Python 3.6.9
  • spacy-affixes 0.1.4
  • spacy 2.2.3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions