While working on the spaCy Japanese model support and integrating Sudachi, ran into the issue that the one-character ellipsis (…) was causing errors. If you tokenize this ellipsis you get three tokens from SudachiPy, with surfaces like ['', '', '…'].
I assume this is a bug but wasn't able to track down where it's happening. I also checked ㍻, and while that is also normalized internally it seems to be output as a single character without issue.