-
Notifications
You must be signed in to change notification settings - Fork 107
Add an API for finding all the misspelled words in a given string #27
Conversation
ad3e2d9 to
5cff726
Compare
|
Only semi related and I can open an issue for it if needed, but do we know if spellchecker / c side use same encoding as javascript ? There's one report where the correction of |
|
Fantastic 🤘
If anything, that seems preferable than doing the splitting ourselves. |
3d3dacc to
6bb8b4c
Compare
ad37f1a to
b65626a
Compare
This function will return an array of character ranges, indicating where *all* of the misspelled words are in a given string.
b65626a to
ab01262
Compare
5e56852 to
0824cd3
Compare
db24c77 to
3085587
Compare
49dbbd4 to
531ec95
Compare
2fa7057 to
777cf8c
Compare
777cf8c to
5f11ffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Mac and Windows, this didn't require any extra work, because the NSRanges returned by NSSpellChecker (and probably all NSString APIs) seem to refer to UTF16 code point indices, as opposed to logical character indices, and the same applies for the Windows spell-check APIs.
For Linux, the Hunspell library only provides a per-word spell-checking API; it doesn't handle arbitrary text. It also expects UTF8-encoded words. I deal with this by passing the string to the native spell-checkers in UTF16 (as V8 natively stores it), and for hunspell, transcoding to UTF8 one word at a time, so that I retain the UTF16 indices.
|
I think this is ready. I'd love to get somebody else's 👀 on it. |
Yeah, it looks like we can now spell-check words like |
|
Ok, seems to be working well on Windows. Gonna 🚢 |
Add an API for finding all the misspelled words in a given string
Fixes atom/spell-check#99
Fixes atom/spell-check#53
Supercedes atom/spell-check#100
Depends on #28
Refs atom/spell-check#53
Refs atom/atom#8908
When opening a large plain text file, Atom's spell check task takes a very long time to process the file. When I open
/usr/share/dict/words, which contains 235,886 words, one per line, the spell check task runs for 95 seconds.Source of the slowness
On Mac, spell checking is implemented by calling into the central
AppleSpellprocess, so there is some IPC overhead for each spell-checking call.There seem to be some overhead for each spell check call on Windows too, as I'm seeing a 2X improvement there. On Linux, our existing code was already fine.
Solution
This PR adds a new native API,
Spellchecker.checkSpelling(string), which takes a multi-word string and returns an array of character ranges representing all of the misspelled words. This way, the spell-checking can be performed in a single shot.TODO
Speedup
On my machine, spell-checking
/usr/share/dict/wordsnow takes about 11 seconds: ~9X faster than before. This is now short enough that my CPU fan stays quiet.Questions
This may cause some subtle behavior change, because the platform's spell-checking library will now be in charge of partitioning the text into words, rather than handling that in JS. This doesn't seem like a huge problem to me, but maybe someone else has some insight into this.
/cc @atom/feedback