-
Notifications
You must be signed in to change notification settings - Fork 313
Open
Labels
Description
Hi,
It is possible to allow for a option which first finds string distances to words in the positive/negative list, and then, if it is above some threshold, categorize it as that word so spelling mistakes and/casual writing style are not lost.
e.g.
> sentiment('Cats are dumb');
{ score: -3,
comparative: -1,
tokens: [ 'cats', 'are', 'dumb' ],
words: [ 'dumb' ],
positive: [],
negative: [ 'dumb' ] }
> sentiment('Cats are dumbbb');
{ score: 0,
comparative: 0,
tokens: [ 'cats', 'are', 'dumbbb' ],
words: [],
positive: [],
negative: [] }
This example dumbbb is so close to dumb that it should be classified as such. Using a library like natural makes this easy.
require('natural').JaroWinklerDistance('dumb', 'dumbbb')
0.9333333333333333
If adding natural is out of scope, maybe a way that someone could inject it in some processing step could work too.
What do you think? Would this work?
tuxton