Tokenizing Ellipsis creates empty tokens

While working on the spaCy Japanese model support and integrating Sudachi, ran into the issue that the one-character ellipsis (`…`) was causing errors. If you tokenize this ellipsis you get three tokens from SudachiPy, with surfaces like `['', '', '…']`. 

I assume this is a bug but wasn't able to track down where it's happening. I also checked ㍻, and while that is also normalized internally it seems to be output as a single character without issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tokenizing Ellipsis creates empty tokens #120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tokenizing Ellipsis creates empty tokens #120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions