Allow token processing "middleware"

Hi,

It is possible to allow for a option which first finds string distances to words in the positive/negative list, and then, if it is above some threshold, categorize it as that word so spelling mistakes and/casual writing style are not lost.

e.g.
```
> sentiment('Cats are dumb');
{ score: -3,
  comparative: -1,
  tokens: [ 'cats', 'are', 'dumb' ],
  words: [ 'dumb' ],
  positive: [],
  negative: [ 'dumb' ] }
> sentiment('Cats are dumbbb');
{ score: 0,
  comparative: 0,
  tokens: [ 'cats', 'are', 'dumbbb' ],
  words: [],
  positive: [],
  negative: [] }
```

This example `dumbbb` is so close to ```dumb``` that it should be classified as such. Using a library like [natural](https://www.npmjs.com/package/natural) makes this easy.
```
require('natural').JaroWinklerDistance('dumb', 'dumbbb')
0.9333333333333333
```

If adding natural is out of scope, maybe a way that someone could inject it in some processing step could work too. 

What do you think? Would this work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow token processing "middleware" #116

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Allow token processing "middleware" #116

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions